String Similarity in PHP

Updated On:

May 14, 2025

,By

Kishore Sahoo

PHP String Similarity: similar_text() vs levenshtein() and Why Combining Them Works Best

When building intelligent applications in PHP—whether it’s for search suggestions, spelling corrections, or deduplication—accurately comparing strings is essential. But string similarity isn’t as simple as “equal or not equal.” Sometimes, it’s about how similar two strings are.

PHP provides two native functions to tackle this problem: similar_text() and levenshtein(). While both compare strings, they serve very different purposes—and understanding when and how to use them (or both together) can give your application a serious edge.

In this deep dive, we’ll explore each function in detail, compare their mechanics and performance, and show you how to build a hybrid similarity algorithm for more robust fuzzy matching.

📌 What Is `similar_text()`?

similar_text() compares two strings and calculates the number of matching characters in the same order. It can also return a similarity percentage, making it great for human-perceived similarity.

Syntax:

int similar_text(string $string1, string $string2, float &$percent)

Returns the number of matching characters.
Optionally fills a variable with a percentage similarity (0–100%).

Example:

similar_text("hello", "hallo", $percent);
echo $percent; // Outputs: 80

📈 Use Cases:

Suggesting related tags, categories, or keywords
Duplicate detection (e.g. blog titles, names)
Comparing user inputs for similarity

What Is `levenshtein()`?

levenshtein() calculates the minimum number of edits (insertions, deletions, substitutions) required to transform one string into another. This is also known as edit distance or Levenshtein distance.

Syntax:

int levenshtein(string $string1, string $string2)

Returns an integer representing how many character edits are required.
Lower numbers = more similar strings.

Example:

echo levenshtein("kitten", "sitting"); // Outputs: 3

This tells us that you need 3 edits to turn “kitten” into “sitting”.

Use Cases:

Spell check and correction
Fuzzy matching in search queries
Detecting misspellings in user-generated content

`similar_text()` vs `levenshtein()`: Key Differences

Feature	`similar_text()`	`levenshtein()`
Purpose	Human-readable similarity scoring	Structural difference (edit distance)
Output	% similarity + match count	Number of edits required
Performance	Slower (O(N³))	Faster (O(N×M))
Case Sensitivity	Yes	Yes
Tolerant of Transposes	Somewhat	No
Ideal For	UX features, similarity suggestions	Typo detection, spell correction

Building a Hybrid String Similarity Function

By combining both functions, you can create a more reliable and nuanced similarity score that captures both structure and perception.

Here’s how:

function hybrid_similarity_score($str1, $str2) {
    $str1 = strtolower(trim($str1));
    $str2 = strtolower(trim($str2));

    // Step 1: similar_text
    similar_text($str1, $str2, $similarity_percent);

    // Step 2: levenshtein
    $lev_distance = levenshtein($str1, $str2);
    $max_len = max(strlen($str1), strlen($str2));
    $lev_score = ($max_len > 0) ? (1 - $lev_distance / $max_len) * 100 : 0;

    // Step 3: Combine with weights
    $combined_score = ($similarity_percent * 0.6) + ($lev_score * 0.4);

    return round($combined_score, 2);
}

🔬 Example Output:

echo hybrid_similarity_score("color", "colour"); // ~87.14
echo hybrid_similarity_score("kitten", "sitting"); // ~62.29

You can tune the weights (0.6 and 0.4) to fit your domain—favoring human readability or edit distance as needed.

When to Use Each (or Both)

Use Case	Recommended Approach
Autocomplete or Search Suggestions	✅ `similar_text()` or Hybrid
Spell Checker or Typo Fix	✅ `levenshtein()`
Duplicate Detection in User Input	✅ Hybrid
Name Matching (Fuzzy)	✅ Hybrid
Content Similarity / Plagiarism Check	✅ `similar_text()`

Performance Consideration in String Similarity

similar_text() is significantly slower than levenshtein() on long strings.
For real-time comparisons (like in a search box), consider using levenshtein() for initial filtering and then apply similar_text() only on shortlisted candidates.

Final Thoughts

Choosing the right string comparison method in PHP depends on the context and intent behind your comparison.

Use levenshtein() for performance-critical, structural typo detection.
Use similar_text() for human-readable similarity judgments.
Use both together for more robust and nuanced applications, especially when fuzzy matching is core to the user experience.

When used smartly, PHP String Similarity can power anything from intelligent search to duplicate content detection and even conversational AI logic.

Kishore Sahoo

Kishore is based in Bangalore, IT Capital of India, specialising in PHP frameworks and CMS where he provides Handcrafted Websites and Applications based on WordPress or Laravel as backend technology & building & deploying powerful Application on the cloud. Specialties: Problem-Solving, Solution Architectures, Startup Initiative, MVP, WordPress, PHP

String Similarity in PHP

📌 What Is `similar_text()`?

Syntax:

Example:

📈 Use Cases:

What Is `levenshtein()`?

Syntax:

Example:

Use Cases:

`similar_text()` vs `levenshtein()`: Key Differences

Building a Hybrid String Similarity Function

🔬 Example Output:

When to Use Each (or Both)

Performance Consideration in String Similarity

Final Thoughts

Crazy about CRO?

Leave a Reply Cancel reply

String Similarity in PHP

📌 What Is similar_text()?

Syntax:

Example:

📈 Use Cases:

What Is levenshtein()?

Syntax:

Example:

Use Cases:

similar_text() vs levenshtein(): Key Differences

Building a Hybrid String Similarity Function

🔬 Example Output:

When to Use Each (or Both)

Performance Consideration in String Similarity

Final Thoughts

Crazy about CRO?

Leave a Reply Cancel reply

📌 What Is `similar_text()`?

What Is `levenshtein()`?

`similar_text()` vs `levenshtein()`: Key Differences