KMP算法详解

KMP算法是一种应用于字符串匹配的算法。

研究KMP算法查了挺久，但是感觉查到的东西都跳过了一些核心理解的说明，花了不少时间弄明白后，我想尽我所能以非常简单详细的方式将这个算法讲明白。

下文需要匹配的字符串我们称为to_match, 文本串称为sample_string。

我们要做的就是查找to_match在sample_string中的位置。

KMP的核心是利用一个数组来记录to_match每一个位置与自己本身（从头开始）匹配的位数，我们把这个数组称为next。

next[n] = m的含义是： to_match的第n位之前（包括第n位）的m位是和开头的前m位匹配的。

求解next数组的过程：

注意： to_match[i] 对应的是 next[i + 1]

void get_next(int &next[MAXN], std::string to_match) {
    next[1] = 0; // next数组的第一位我们默认为0, 因为to_match第一位之前没有字符了。
    int current_match = 0; // current_match表示当前匹配的位数
    for (int i = 1; i < to_match.length(); i++) { //第一位默认为0, 所以我们从to_match的第2位to_match[1]开始计算next值

        if (to_match[current_match] == to_match[i]) current_match++; // 如果匹配，则匹配位数加一
        else 
            while (to_match[current_match] != to_match[i] && current_match > 0)
                current_match = next[current_match];
                // 这里最难理解，如果不匹配，则将匹配位数跳到当前已经匹配成功的to_match[current_match - 1]对应的next[current_match]
                // 不需要从头开始匹配，因为跳到to_match[current_match - 1]对应的next之后可以确保还是处于匹配状态的，只是匹配的位数减少了
                // 这是KMP比暴力匹配快的原因，匹配失败后不会从头匹配，而是跳到另一个匹配状态继续进行匹配
                // 形象的描述：
                // 这里容易搞混的是其实理解上这里有2个to_match, 一个是被for循环遍历的to_match， 一个是当前匹配到的to_match，为了清晰一点下面把当前匹配对应的to_match称为x_to_match
                // 当前是这样的 ××××××x_to_match[0]……x_to_match[current_match - 1]×××××× 但是下一位x_to_match[current_match]和to_match[i]不匹配
                // 跳转之后，假设next[current_match]的值为n，那么current_match = n
                // 跳转之后是这样的 ××××××x_to_match[0]……x_to_match[n - 1]××××××，此时的x_to_match[0]……x_to_match[n - 1]其实还是处于匹配状态的
                // 因为对于x_to_match[current_match - 1]来说，这一位的前n位（包括自身）是和开头的前n位匹配的，所以跳转之后匹配状态是已经匹配了n位的状态
                // 这里一定要理解，搞懂了这一行这个算法你就会写了
            // 跳转之后的x_to_match[current_match]需要重新和to_match[i]匹配，这里的循环的作用是不断让匹配状态更新直到可以与当前to_match[i]匹配的状态或者从头匹配

        next[i + 1] = current_match; // 给to_match[i]对应的next[i + 1]赋值当前匹配位数
    }
}

求出next数组之后就是查找了，查找过程和求next的过程大同小异，只不过保持x_to_match不变，把之前遍历的to_match改成了sample_string而已：

int find_to_match_position_in_sample_string(int next[MAXN], std::string to_match, std::string sample_string) {
    int current_match = 0;
    for (int i = 0; i < sample_string.length(); i++) {

        if (to_match[current_match] == sample_string[i]) current_match++;
        else 
            while (to_match[current_match] != smaple_string[i] && current_match > 0)
                current_match = next[current_match];

        if (current_match == to_match.length()) return i - current_match; // 相等说明to_match已经匹配完成，直接返回位置即可
    }
    return -1; // 如果不匹配则返回-1
}

至此，KMP算法的核心步骤已经完成，简单来说就两步：

1. 计算出next数组

2. 根据next数组查找

如果此文至少让一个人懂了KMP，那我会高兴的满地打滚的。