『Knuth-Morris-Pratt Algorithm with its Automaton』

Let's define (pi) function to descripe the longest common prefix and suffix ( also called "border" ) of substring (s_i) :

[pi (s_i)=max_{s(1:t)=s(i-t+1:i)}{t} ]

We appoint that (pi^{(t)}(s)) means (pi(s_{pi^{(t-1)}(s)})), and (pi^{(1)}(s)=pi(s)). Then define

[eta = min t ext{ s.t. } s( pi^{(t)}(s_{i-1})+1)=s(i) ]

Obviously, (pi(s_i)=pi^{(eta)}(s_{i-1})+1),so we can get (pi(s_{1sim |s|})) in (mathcal{O}(|s|)) time to solve the problem.

Of course, if we can't find a positive integer (eta) satisfying the condition, let (pi(s_i)=0) : )

#include <bits/stdc++.h>
using namespace std;
const int N = 1e6 + 20;
int n, m, Fail[N]; char s[N], t[N];
int main(void) {
	scanf( "%s", s + 1 ), n = strlen( s + 1 );
	scanf( "%s", t + 1 ), m = strlen( t + 1 );
	for (int i = 2, j = 0; i <= m; i++) {
		while ( j && t[j+1] != t[i] ) j = Fail[j];
		j += t[j+1] == t[i], Fail[i] = j;
	}
	for (int i = 1, j = 0; i <= n; i++) {
		while ( j && ( t[j+1] != s[i] || j == m ) ) j = Fail[j];
		j += t[j+1] == s[i]; 
		if ( j == m ) printf( "%d
", i - m + 1 );
	}
	for (int i = 1; i <= m; i++) 
		printf( "%d%c", Fail[i], " 
"[ i == m ] );
	return 0;
} 

A deterministic finite automaton (mathcal{T}) called Knuth–Morris–Pratt Automaton when (mathcal{T}=(Q,alpha,delta,q_0,F)) refers to :

  • (|Q|=|s|+1,alpha = { exttt{a}, exttt{b}, exttt{c},cdots})
  • (delta(p,c)=pi(s_p+c))
  • (q_0=0,|F|=1,|s|in F,mathcal{T}(s)= ext{true})

Actually, Knuth–Morris–Pratt Automaton is a simple automaton (mathcal{T}') which only accepts string (s) extended several transition accroding to our (pi) function. So we intend to construct the automaton by using the transition method of (pi(s_i)).

Let's assume for a case where we have already constructed the automaton of (s_i),then we want to extend a character (c). How the algorithm works ? Firstly, we can get the length of the border of (s_i) ( (=pi(s_i)) ) . So it's not hard to find that the state (delta(pi(s_i),c)) is the exact state which we are finding. That is (delta(s_i,c)gets delta(pi(s_i),c)).

Specially, let (delta(s_i,s(i+1))gets i+1) in order to recognize the whole string (s). To sum up, we construct the automaton in (mathcal O(|s| imes |alpha|)) time.

inline void Build(char *s) {
    int len = strlen( s + 1 );
    trans[0][s[1]-'a'] = 1;
    for (int i = 1, j = 0; i < len; i++) {
        for (int k = 0; k < 26; k++) trans[i][k] = trans[j][k];
        trans[i][s[i+1]-'a'] = i+1, j = trans[j][s[i+1]-'a'];
    }
}
原文地址:https://www.cnblogs.com/Parsnip/p/13985001.html