Noppanit

01 Jan 2013

# What if matching exact string is not enough, basic fuzzy search that you want to know.

First of all, I’m not a researcher what I post here is just my experiment that I do it myself.

I have this problem when I want to match string but not exact match. For example, I have this string “I really love The Matrix movie because of this quote ‘welcome to the real world’” and I have a bunch of tags in my database such as “The Matrix”, “welcome to the real world”, “red pill”, “blue pill”. So, if I want to match the tags with the string I could do regular expression like this or actually I could do the exact match and still get the correct one.

/.*world.*/


That would give me what I want. However, what if I have another movie “The day after tomorrow” with tags “world”. If I search with word that string against the tags I will get two movies. So, what can I do? I did some Google and found fuzzy search and I picked “Levenshtein”. I’m not good at maths so I can’t really explains in technical correctly. But basically, it’s a technique to look into two strings and count every time you have to “edit”</strong> to make the strings exact match. For example,

“I love you” # string 1

“I also love you” # string 2


The distance would be 1 because it takes 1 edit to make the two strings exact match. We have to add also or delete also 1 time.

So from the example above.

# Distance 12
puts Levenshtein.distance("welcome to the real world".split(" "), "I really love The Matrix movie because of this quote 'welcome to the real world'".split(" "))
# Distance 15
puts Levenshtein.distance("world".split(" "), "I really love The Matrix movie because of this quote 'welcome to the real world'".split(" "))



So this way, the tag welcome to the real world is more likely to match with the string I really love The Matrix movie because of this quote ‘welcome to the real world

I use this Gem for this experiment

Til next time,
noppanit at 00:00