牛奶模式Milk Patterns

luoguP2852
题目描述

Farmer John has noticed that the quality of milk given by his cows varies from day to day. On further investigation, he discovered that although he can’t predict the quality of milk from one day to the next, there are some regular patterns in the daily milk quality.

To perform a rigorous study, he has invented a complex classification scheme by which each milk sample is recorded as an integer between 0 and 1,000,000 inclusive, and has recorded data from a single cow over N (1 ≤ N ≤ 20,000) days. He wishes to find the longest pattern of samples which repeats identically at least K (2 ≤ K ≤ N) times. This may include overlapping patterns – 1 2 3 2 3 2 3 1 repeats 2 3 2 3 twice, for example.

Help Farmer John by finding the longest repeating subsequence in the sequence of samples. It is guaranteed that at least one subsequence is repeated at least K times.

农夫John发现他的奶牛产奶的质量一直在变动。经过细致的调查，他发现：虽然他不能预见明天产奶的质量，但连续的若干天的质量有很多重叠。我们称之为一个“模式”。 John的牛奶按质量可以被赋予一个0到1000000之间的数。并且John记录了N(1<=N<=20000)天的牛奶质量值。他想知道最长的出现了至少K(2<=K<=N)次的模式的长度。比如1 2 3 2 3 2 3 1 中 2 3 2 3出现了两次。当K=2时，这个长度为4。

输入输出格式

输入格式：
Line 1: Two space-separated integers: N and K

Lines 2..N+1: N integers, one per line, the quality of the milk on day i appears on the ith line.

输出格式：
Line 1: One integer, the length of the longest pattern which occurs at least K times

输入输出样例

输入样例#1：
8 2
1
2
3
2
3
2
3
1

输出样例#1：
4

注意：luogu的数据较水，不用离散，本蒟蒻使用m=1000就成功A了，大神们可以加上离散化。。。
分析：后缀数组处理出height，我们要求的是重复k次的最长子序列，其实就是k个后缀，这k个后缀在rank上一定是连续的（毕竟只有这样这些后缀才最为相似），之后求出他们的最长公共前缀，之后再求出的每个公共前缀中取max。
这里需要特别注意的是，说是求k个后缀的最长公共前缀，然而height[i]是排名i和i-1的后缀的最长公共前缀，所以实际上我们只用计算k-1长度的区间就可以了。
我们知道，后缀(l)和后缀(r)的最长公共前缀就是min(height[l+1->r])。
用RMQ维护min就可以了（当然，直接暴力枚举i，扫一遍k-1长度的区间也是可以的）

这里写代码片
#include<cstdio>
#include<cstring>
#include<iostream>
#include<cmath>

using namespace std;

const int INF=100000010;
const int N=100010;
int sa[N],rak[N],a[N],b[N],hei[N],num[N],len,cc[N],kk;
int f[N][40];

int cmp(int *y,int a,int b,int k)
{
    int ra1=y[a];
    int rb1=y[b];
    int ra2= a+k>=len ? -1:y[a+k];
    int rb2= b+k>=len ? -1:y[b+k];
    return ra1==rb1&&ra2==rb2;
}

void make_sa()
{
    int i,k,m,p,*x=a,*y=b,*t;
    m=1000;
    for (i=0;i<m;i++) cc[i]=0;
    for (i=0;i<len;i++) ++cc[x[i]=num[i]];
    for (i=1;i<m;i++) cc[i]+=cc[i-1];
    for (i=len-1;i>=0;i--) sa[--cc[x[i]]]=i;
    for (k=1;k<=len;k<<=1)
    {
        p=0;
        for (i=len-k;i<len;i++) y[p++]=i;  //y[p++]=i;
        for (i=0;i<len;i++) if (sa[i]>=k) y[p++]=sa[i]-k;
        for (i=0;i<m;i++) cc[i]=0;
        for (i=0;i<len;i++) ++cc[x[y[i]]];
        for (i=1;i<m;i++) cc[i]+=cc[i-1];
        for (i=len-1;i>=0;i--) sa[--cc[x[y[i]]]]=y[i];
        t=x;x=y;y=t;
        x[sa[0]]=0;
        p=1;
        for (i=1;i<len;i++)
        {
            x[sa[i]]=cmp(y,sa[i-1],sa[i],k) ? p-1:p++;
        }
        if (p>=len) break;
        m=p;
    }
}

void make_hei()  //hei按照后缀在字符串中的顺序
{
    int i,k=0;
    hei[0]=0;
    for (i=0;i<len;i++) rak[sa[i]]=i;
    for (i=0;i<len;i++)
    {
        if (!rak[i]) continue;
        int j=sa[rak[i]-1];
        if (k) k--;
        while (num[i+k]==num[j+k]) k++;
        hei[rak[i]]=k;
    }
    return;
}
//SA是"排第几的是谁？",RANK是"你排第几？"
void cl()   
{
    int unit=log(len)/log(2)+1;
    int i,j;
    for (i=0;i<len;i++) f[i][0]=hei[i];
    for (i=1;i<=unit;i++)
        for (j=0;j<len;j++)
           if (j+(1<<i)-1<len)
              f[j][i]=min(f[j][i-1],f[j+(1<<(i-1))][i-1]);
    return;
}

void solve()
{
    int i,j,ans=0;
    for (i=0;i<len;i++)
    {
        j=min(i+kk-2,len-1);  //注意是k-1的长度 
        int unit=log(kk-1)/log(2);
        ans=max(ans,min(f[i][unit],f[j-(1<<unit)+1][unit]));
    }
    printf("%d",ans);
    return;  
}

int main()
{
    scanf("%d%d",&len,&kk);
    for (int i=0;i<len;i++)
        scanf("%d",&num[i]);
    make_sa();
    make_hei();
    cl();  //RMQ
    solve();
    return 0;
}