Edit Distance

Introduce

Through the following, u will know what's edit distance.

target : you are cute  and  I         love u.

source:   I   am cute          I don't love u.

For the two sentences, u can use edit including insert, delete and substitute the source so that it can be same to the target.

u will subsutitute "you" for "I", which will cost 2

subsutitue "are" for "am", which will cost 2

insert "and", which will cost 1

delete "don't", which will cost 1

so, totally, u will cost 2 + 2 + 1 + 1 = 6, which is the edit distance.

How to calculate the minimum edit distance

Now, we take another example

target : s t o p

source: s o t

u can insert "t", which will cost 1, to get "stot", then u can substitute "p" for "t", which will cost 2, to get "stop".

Besides, u can substitue "t" for "o" to get "stt", then u can substitute "o" for "t" to get "sto", last, u can insert "p" to get "stop"

Alright, u can have plenty of method to edit to get the target, so how to get the minimum edit distance. We will introduce the algorithm as following, which is dynamic programming.

i : the length of the target

j : the length of the source

D(0, 0) = 0

D(i, 0) = insertCost * i, means the length of the target is i and the length of the source is 0, so u should insert i characters

D(0, j) = deleteCost * j, means the length of the target is 0 and the length of the source is j, so u should delete j characters

D(i, j) = min {D(i - 1, j) + insertCost(target[i]);

                    D(i - 1, j - 1) + substituteCost(source[j], target[i]);

                    D(i, j - 1) + deleteCost(source[j])} means u should choose the method which will cost least.substituteCost = {0, 2}, if source[j] is equal to target[i], it's 0, or it's 2.

insertCost = 1

deleteCost = 1

Well, maybe u will feel confuse, but no problem, We will use the algorithm to solve the above question.

firstly, u can gain the belew form.

 

D(1, 1) = min {D[0, 1] + insert(t[1]) = 2;

                      D[0, 0] + substitute(s[1], t[1]) = 0;

        D[1, 0] + delete(s[1]) = 2}          =    0

D[1, 2] = min{ D[0, 2] + insert(t[1]) = 3;

        D[0, 1] + substitute(s[2], t[1]) = 3;

        D[1, 1] + delete(s[2] = 1)}            =   1

................

then u will get the last number

D[4, 3] = min {D[3, 2] + insert(t[4]) = 3;

        D[3, 2] + substitute(s[3], t[4]) = 3;

        D[4, 2] + delete(s[3]) = 3}    =    3

Actually, u can get the path how to get edit through every step. The below picture show the results.

原文地址:https://www.cnblogs.com/chuanlong/p/2993774.html