PHP String Similarity: similar_text()
vs levenshtein()
and Why Combining Them Works Best
When building intelligent applications in PHP—whether it’s for search suggestions, spelling corrections, or deduplication—accurately comparing strings is essential. But string similarity isn’t as simple as “equal or not equal.” Sometimes, it’s about how similar two strings are.
PHP provides two native functions to tackle this problem: similar_text()
and levenshtein()
. While both compare strings, they serve very different purposes—and understanding when and how to use them (or both together) can give your application a serious edge.
In this deep dive, we’ll explore each function in detail, compare their mechanics and performance, and show you how to build a hybrid similarity algorithm for more robust fuzzy matching.
📌 What Is similar_text()
?
similar_text()
compares two strings and calculates the number of matching characters in the same order. It can also return a similarity percentage, making it great for human-perceived similarity.
Syntax:
int similar_text(string $string1, string $string2, float &$percent)
- Returns the number of matching characters.
- Optionally fills a variable with a percentage similarity (0–100%).
Example:
similar_text("hello", "hallo", $percent);
echo $percent; // Outputs: 80
📈 Use Cases:
- Suggesting related tags, categories, or keywords
- Duplicate detection (e.g. blog titles, names)
- Comparing user inputs for similarity
What Is levenshtein()
?
levenshtein()
calculates the minimum number of edits (insertions, deletions, substitutions) required to transform one string into another. This is also known as edit distance or Levenshtein distance.
Syntax:
int levenshtein(string $string1, string $string2)
- Returns an integer representing how many character edits are required.
- Lower numbers = more similar strings.
Example:
echo levenshtein("kitten", "sitting"); // Outputs: 3
This tells us that you need 3 edits to turn “kitten” into “sitting”.
Use Cases:
- Spell check and correction
- Fuzzy matching in search queries
- Detecting misspellings in user-generated content
similar_text()
vs levenshtein()
: Key Differences
Feature | similar_text() | levenshtein() |
---|---|---|
Purpose | Human-readable similarity scoring | Structural difference (edit distance) |
Output | % similarity + match count | Number of edits required |
Performance | Slower (O(N³)) | Faster (O(N×M)) |
Case Sensitivity | Yes | Yes |
Tolerant of Transposes | Somewhat | No |
Ideal For | UX features, similarity suggestions | Typo detection, spell correction |
Building a Hybrid String Similarity Function
By combining both functions, you can create a more reliable and nuanced similarity score that captures both structure and perception.
Here’s how:
function hybrid_similarity_score($str1, $str2) {
$str1 = strtolower(trim($str1));
$str2 = strtolower(trim($str2));
// Step 1: similar_text
similar_text($str1, $str2, $similarity_percent);
// Step 2: levenshtein
$lev_distance = levenshtein($str1, $str2);
$max_len = max(strlen($str1), strlen($str2));
$lev_score = ($max_len > 0) ? (1 - $lev_distance / $max_len) * 100 : 0;
// Step 3: Combine with weights
$combined_score = ($similarity_percent * 0.6) + ($lev_score * 0.4);
return round($combined_score, 2);
}
🔬 Example Output:
echo hybrid_similarity_score("color", "colour"); // ~87.14
echo hybrid_similarity_score("kitten", "sitting"); // ~62.29
You can tune the weights (0.6
and 0.4
) to fit your domain—favoring human readability or edit distance as needed.
When to Use Each (or Both)
Use Case | Recommended Approach |
---|---|
Autocomplete or Search Suggestions | ✅ similar_text() or Hybrid |
Spell Checker or Typo Fix | ✅ levenshtein() |
Duplicate Detection in User Input | ✅ Hybrid |
Name Matching (Fuzzy) | ✅ Hybrid |
Content Similarity / Plagiarism Check | ✅ similar_text() |
Performance Consideration in String Similarity
similar_text()
is significantly slower thanlevenshtein()
on long strings.- For real-time comparisons (like in a search box), consider using
levenshtein()
for initial filtering and then applysimilar_text()
only on shortlisted candidates.
Final Thoughts
Choosing the right string comparison method in PHP depends on the context and intent behind your comparison.
- Use
levenshtein()
for performance-critical, structural typo detection. - Use
similar_text()
for human-readable similarity judgments. - Use both together for more robust and nuanced applications, especially when fuzzy matching is core to the user experience.
When used smartly, PHP String Similarity can power anything from intelligent search to duplicate content detection and even conversational AI logic.
Leave a Reply