Page 1 of 1

(how to) find an approximate match for a fact with strings (ok for VIP 7)

Posted: 17 Apr 2006 14:42
by Gildas Menier
The package vp_string adds a predicate that computes an editing distance between two strings (editingdistance(string,string) -> integer).

It returns the number of operations it takes to get from the first string to the second string. Allowed op are insert (add a character), delete (remove one) or replace one character by another character.
A distance of 0 means no differences at all, 1 means there is a difference of one character between the two strings...

You may compute distance with strings up to 100 characters with this predicates.

The included project shows how you can use this predicate to perform an approximate search in a DB.

The best references for this algorithm is :

V.I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 1966. (hard to find)

R.A Wagner and M.J. Fischer. The string to string correction problem. Journal of the ACM, 21:168--173, 1974.


Best Regards

Gildas