KMP Demo

The key of Kmp is to build a look up table that records the match result of prefix and postfix. Value in the table means the max len of matching substring that exists in both prefix and postfix. In the prefix this substring should start from 0, while in the postfix this substring should ends at current index.

For example, now we have a string "ababc"

The KMP table will look liks this:

 a  b  a  b  c
-1  0  1  2  

(Note: we will not match substring with itself, so we will skip index 0)

So how does this table help us search string match faster?

Well, the answer is if we are trying to match a char after posfix with target string and failed, then we can smartly shift the string, so that the matching string in perfix will replace postfix and now we can try to match the char after prefix with this char in target.

Take above string as an example.

Now we try to match string "ababc" with "abababc".

We will initially have match as below

 0 1 2 3 4 5 6
 a b a b a b c (string x)
 a b a b c (string y)
-1 0 1 2 

We found char at index 4 does not match, then we can use look up table and shift the string y wisely. 

We found table[3] = 2, which means we can shift the string y rightward by 2, and still have same but shorter prefix before index 4, like this:

 0 1 2 3 4 5 6
 a b a b a b c (string x)
     a b a b c (string y)
    -1 0 1 2 

If there is a long gap between prefix and postfix, this shift can help us save a lot of time. In the brute force way, we cannot do that because we have no information of the string. We have to compare each possible pair of chars. While in KMP, we know the information of string y so we can move smartly. We can directly jump to the next possible match pairwhile discard useless pair of chars.

We are almost done with KMP, but we still have one special case that needs to be taken care of.

Say now we have a input like this:

 0 1 2 3 4 5
 a a b a a a
-1 1 0 2 3 

How should we build the KMP table for this string?

Say the pointer in prefix is 'x', which is at index 2 now and the pointer in postfix is 'y' which is at index 5 now. We need to match 'b' pointed by x with 'a' pointed by y. It is an unmatched pair, how should we update the cell? 

Well, we really don't need to reset it to 0, that will make us skip a valid shorter matching substring "aa".

What we do now is just to shorten the length of substring by 1 unit and try to match a shorter substring "aa". This can be done by moving pointer x to the index recorded in [indexOf(x) - 1] while keep pointer y stay still. This is because by following the value in KMP table we can always make sure previous part of prefix and postfix is matched even we have shorten their length, so wo only need to care about the char after matched part in prefix and posfix.

Code: [Java]

// JAVA program for implementation of KMP pattern 
// searching algorithm 

class KMP_String_Matching { 
	void KMPSearch(String pat, String txt) 
	{ 
		int M = pat.length(); 
		int N = txt.length(); 

		// create lps[] that will hold the longest 
		// prefix suffix values for pattern 
		int lps[] = new int[M]; 
		int j = 0; // index for pat[] 

		// Preprocess the pattern (calculate lps[] 
		// array) 
		computeLPSArray(pat, M, lps); 

		int i = 0; // index for txt[] 
		while (i < N) { 
			if (pat.charAt(j) == txt.charAt(i)) { 
				j++; 
				i++; 
			} 
			if (j == M) { 
				System.out.println("Found pattern "
								+ "at index " + (i - j)); 
				j = lps[j - 1]; 
			} 

			// mismatch after j matches 
			else if (i < N && pat.charAt(j) != txt.charAt(i)) { 
				// Do not match lps[0..lps[j-1]] characters, 
				// they will match anyway 
				if (j != 0) 
					j = lps[j - 1]; 
				else
					i = i + 1; 
			} 
		} 
	} 

	void computeLPSArray(String pat, int M, int lps[]) 
	{ 
		// length of the previous longest prefix suffix 
		int len = 0; 
		int i = 1; 
		lps[0] = 0; // lps[0] is always 0 

		// the loop calculates lps[i] for i = 1 to M-1 
		while (i < M) { 
			if (pat.charAt(i) == pat.charAt(len)) { 
				len++; 
				lps[i] = len; 
				i++; 
			} 
			else // (pat[i] != pat[len]) 
			{ 
				// This is tricky. Consider the example. 
				// AAACAAAA and i = 7. The idea is similar 
				// to search step. 
				if (len != 0) { 
					len = lps[len - 1]; 

					// Also, note that we do not increment 
					// i here 
				} 
				else // if (len == 0) 
				{ 
					lps[i] = len; 
					i++; 
				} 
			} 
		} 
	} 

	// Driver program to test above function 
	public static void main(String args[]) 
	{ 
		String txt = "ABABDABACDABABCABAB"; 
		String pat = "ABABCABAB"; 
		new KMP_String_Matching().KMPSearch(pat, txt); 
	} 
} 
// This code has been contributed by Amit Khandelwal. 

  

Reference:

https://www.geeksforgeeks.org/java-program-for-kmp-algorithm-for-pattern-searching-2/

https://leetcode.com/problems/shortest-palindrome/discuss/60113/Clean-KMP-solution-with-super-detailed-explanation

永远渴望,大智若愚(stay hungry, stay foolish)
原文地址:https://www.cnblogs.com/h-hkai/p/10711911.html