performance - Efficient implementation of Damerau-Levenshtein distance -


i'm trying implement efficient clojure function compute damerau-levenshtein distance. i've decided use this algorithm (attached source should c++) computing levenshtein distance , add lines make work dld.

here i've created in common lisp (i hope help):

(defun damerau-levenshtein (x y)   (declare (type string x y)            #.*std-opts*)   (let* ((x-len (length x))          (y-len (length y))          (v0 (apply #'vector (mapa-b #'identity 0 y-len)))          (v1 (make-array (1+ y-len) :element-type 'integer))          (v* (make-array (1+ y-len) :element-type 'integer)))     (do ((i 0 (1+ i)))         ((= x-len) (aref v0 y-len))       (setf (aref v1 0) (1+ i))       (do ((j 0 (1+ j)))           ((= j y-len))         (let* ((x-i (char x i))                (y-j (char y j))                (cost (if (char-equal x-i y-j) 0 1)))           (setf (aref v1 (1+ j)) (min (1+ (aref v1 j))                                       (1+ (aref v0 (1+ j)))                                       (+  (aref v0 j) cost)))           (when (and (plusp i) (plusp j))             (let ((x-i-1 (char x (1- i)))                   (y-j-1 (char y (1- j)))                   (val (+ (aref v* (1- j)) cost)))               (when (and (char-equal x-i y-j-1)                          (char-equal x-i-1 y-j)                          (< val (aref v1 (1+ j))))                 (setf (aref v1 (1+ j)) val))))))       (rotatef v* v0 v1)))) 

now, fear cannot translate efficient , idiomatic clojure code (in functional style?). appreciate suggestion , think may quite useful many future readers too.

p.s. i've found this implementation, doubt if efficient , uses obsolete contrib functions (deep-merge-with , bool-to-binary):

(defn damerau-levenshtein-distance   [a b]   (let [m (count a)         n (count b)         init (apply deep-merge-with (fn [a b] b)                     (concat                      ;;deletion                      (for [i (range 0 (+ 1 m))]                        {i {0 i}})                      ;;insertion                      (for [j (range 0 (+ 1 n))]                        {0 {j j}})))         table (reduce                (fn [d [i j]]                  (deep-merge-with                   (fn [a b] b)                   d                   (let [cost (bool-to-binary (not (= (nth (- 1))                                           (nth b (- j 1)))))                         x                           (min                            (+ ((d (- 1))                                j) 1) ;;deletion                            (+ ((d i)                                (- j 1)) 1) ;;insertion                            (+ ((d (- 1))                                (- j 1)) cost)) ;;substitution))                         val (if (and (> 1)                                (> j 1)                                (= (nth (- 1))                                   (nth b (- j 2)))                                (= (nth (- 2))                                   (nth b (- j 1))))                         (min x (+ ((d (- 2))                                    (- j 2)) ;;transposition                                   cost))                         x)]                     {i {j val}})))                init                (for [j (range 1 (+ 1 n))                      (range 1 (+ 1 m))] [i j]))]     ((table m) n))) 

i had write efficient levenshtein distance function in clojure calculate edits between ground truth text , ocr engine result. recursive implementation wasn't performant enough calculate levenshtein distance between 2 whole pages, implementation uses dynamic programming. instead of dropping down java 2d-arrays uses core.matrix handle matrix stuff. adding transposition stuff damerau-levenshtein should not hard.

(defn lev [str1 str2]   (let [mat (new-matrix :ndarray (inc (count str1)) (inc (count str2)))         len1 (count str1) len2 (count str2)]    (mset! mat 0 0 0)    (dotimes [i lein1]      (mset! mat (inc i) 0 (inc i)))    (dotimes [j len2]      (mset! mat 0 (inc j) (inc j)))    (dotimes [dj len2]      (dotimes [di len1]        (let [j (inc dj) (inc di)]          (mset! mat j               (cond                 (= (.charat ^string str1 di) (.charat ^string str2 dj))                 (mget mat di dj)                 :else                 (min (inc (mget mat di j)) (inc (mget mat dj))                     (inc (mget mat di dj))))))))    (mget mat len1 len2)))) 

hope helps


Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

visual studio 2010 - Connect to informix database windows form application -

android - Associate same looper with different threads -