字符串哈希+kmp题

9.7

Crazy Search(字符串哈希)

Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle.
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text.

As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa"; "aab"; "aba"; "bab"; "bac". Therefore, the answer should be 5.

Input

The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions.

Output

The program should output just an integer corresponding to the number of different substrings of size N found in the given text.

Sample Input

3 4
daababac

Sample Output

5

题意:就是给你n(字符长)m(字符种类),然后给你长的字符串,让你查找出以n为长度的不同的字符串有多少个
#include<iostream>
#include<cstdio>
#include<cstring>
using namespace std;
const int maxn=16e6;
bool hash[maxn];
char word[26];
int main()
{
 int n,nc;
 string s;
 cin>>n>>nc>>s;
 memset(hash,0,sizeof(hash));
 memset(word,0,sizeof(word));
 int k=0;
 
 for(int i=0;i<s.length();i++)
 {
  if(word[s[i]-'a']==0)
     word[s[i]-'a']=k++;
  if(k==nc)break;
 }
 
 int len=s.size()-n+1;//daababac三个三个一组则有六组
 int ans=len;//最好的情况,都不相同
 
  //哈希值是唯一的,这里用hash标记
 for(int i=0;i<len;i++)
 {
  int num=0;
  for(int j=i;j<i+n;j++)
    num=num*nc+word[s[j]-'a'];
  num%=maxn;
  if(hash[num])
     ans--;
  else
    hash[num]=1;
 }
 cout<<ans<<endl;
 return 0;
}

 9.8

String  Problem(字符串最大最小表示)

Give you a string with length N, you can generate N strings by left shifts. For example let consider the string “SKYLONG”, we can generate seven strings:
String Rank
SKYLONG 1
KYLONGS 2
YLONGSK 3
LONGSKY 4
ONGSKYL 5
NGSKYLO 6
GSKYLON 7
and lexicographically first of them is GSKYLON, lexicographically last is YLONGSK, both of them appear only once.
  Your task is easy, calculate the lexicographically fisrt string’s Rank (if there are multiple answers, choose the smallest one), its times, lexicographically last string’s Rank (if there are multiple answers, choose the smallest one), and its times also.

Input  Each line contains one line the string S with length N (N <= 1000000) formed by lower case letters.OutputOutput four integers separated by one space, lexicographically fisrt string’s Rank (if there are multiple answers, choose the smallest one), the string’s times in the N generated strings, lexicographically last string’s Rank (if there are multiple answers, choose the smallest one), and its times also.Sample Input

abcder
aaaaaa
ababab

Sample Output

1 1 6 1
1 6 1 6
1 3 2 3

题解:这道题就是输出所给字符串的最大最小表示的起始位置和数量!
这道题是kmp以及最大最小表示法的应用,数量我们可以用kmp中求循环节的函数,然后结合最大最小表示法模板
#include<bits/stdc++.h>
using namespace std;
const int maxn=1e6+10;
char s[maxn];
int ne[maxn];
void kmpnext(char s[],int len)
{
    int i=0,j=-1;
    ne[0]=-1;
    while(i<len)
    {
        if(j==-1||s[i]==s[j])
        {
            i++,j++;
            ne[i]=j;
        }
        else
          j=ne[j];
    }
}
int mmstring(int flag,char q[])//最大最小表示法模板
{
    int len=strlen(q);
    int i=0,j=1,k=0;
    while(i<len&&j<len&&k<len)
    {
        int t=q[(i+k)%len]-q[(j+k)%len];
        if(!t)
           k++;
        else
        {
            if(flag==0)
            {
                if(t>0)i=i+k+1;
                else  j=j+k+1;
            }
            else
            {
                if(t>0)j=j+k+1;
                else  i=i+k+1;
            }
            if(i==j)  j++;
            k=0;
        }
    }
    if(i<j)return i;
    else  return j;
}
int main()
{
    int minn,maxx;
    while(~scanf("%s",s))
    {
        int len=strlen(s);
        kmpnext(s,len);
        int t=1;
        if(len%(len-ne[len])==0)
           t=len/(len-ne[len]);
        minn=mmstring(0,s);
        maxx=mmstring(1,s);
        printf("%d %d %d %d
",minn+1,t,maxx+1,t);
    }
    return 0;
}
9.9
Period(kmp模板)
For each prefix of a given string S with N characters (each character has an ASCII code between 97 and 126, inclusive), we want to know whether the prefix is a periodic string. That is, for each i (2 <= i <= N) we want to know the largest K > 1 (if there is one) such that the prefix of S with length i can be written as A K ,that is A concatenated K times, for some string A. Of course, we also want to know the period K.
Input
The input consists of several test cases. Each test case consists of two lines. The first one contains N (2 <= N <= 1 000 000) – the size of the string S.The second line contains the string S. The input file ends with a line, having the
number zero on it.

Output
For each test case, output "Test case #" and the consecutive test case number on a single line; then, for each prefix with length i that has a period K > 1, output the prefix size i and the period K separated by a single space; the prefix sizes must be in increasing order. Print a blank line after each test case.

Sample Input
3
aaa
12
aabaabaabaab
0
Sample Output
Test case #1
2 2
3 3

Test case #2
2 2
6 2
9 3
12 4
题解:这道题是kmp模板题,给定字符串,依次查找,判断有循环的位置以及循环个数,主要是用存循环节的函数
#include<iostream>
using namespace std;
const int maxn=1e6+10;
int ne[maxn]; 
char s[maxn];
void kmpnext(char s[],int n)
{
    int i=0,j=-1;
    ne[0]=-1;
    while(i<n)
    {
        if(j==-1||s[i]==s[j])
        {
            i++,j++;
            int t=i/(i-j);
            if(i%(i-j)==0&&t>1)
               cout<<i<<" "<<t<<endl;
            if(s[i]!=s[j])ne[i]=j;
            else
              ne[i]=ne[j];
        }
        else
          j=ne[j];
    }
}
int main()
{
    int x=1,n;
    while(cin>>n&&n)
    {
        cin>>s;
        cout<<"Test case #"<<x<<endl;
        kmpnext(s,n);
        x++;
        cout<<endl;
    }
    return 0;
}
9。10
string matching(扩展kmp)
String matching is a common type of problem in computer science. One string matching problem is as following:

Given a string s[0len1]s[0…len−1], please calculate the length of the longest common prefix of s[ilen1]s[i…len−1] and s[0len1]s[0…len−1] for each i>0i>0.

I believe everyone can do it by brute force.
The pseudo code of the brute force approach is as the following:



We are wondering, for any given string, what is the number of compare operations invoked if we use the above algorithm. Please tell us the answer before we attempt to run this algorithm.
Input
The first line contains an integer TT, denoting the number of test cases.
Each test case contains one string in a line consisting of printable ASCII
characters except space.
1T301≤T≤30
* string length 106≤106 for every string
OutputFor each test, print an integer
in one line indicating the number of compare operations invoked if we run the
algorithm in the statement against the input string.
Sample Input
3
_Happy_New_Year_
ywwyww
zjczzzjczjczzzjc
Sample Output
17
7
32
题意:
字符串匹配是计算机科学中常见的问题类型。一个字符串匹配问题如下:给定字符串s [0 ... len-1],请计算
每个i的s [i ... len-1]和s [0 ... len-1]的最长公共前缀的长度> 0。
我相信每个人都可以通过蛮力来做到这一点。蛮力方法的伪代码如下:我们想知道,
对于任何给定的字符串,如果我们使用上述算法,调用的比较操作的数量是多少。在我们尝试运行此算法之前,请告诉我们答案

这道题很懵,看了许多博客,也还是迷迷糊糊的的,扩展kmp,没搞懂,,,,
  • next[i]: T[i]...T[m - 1]与 T 的最长相同前缀长度;
  • extend[i]: S[i]...S[n - 1]与 T 的最长相同前缀长度。

#include<cstdio>
#include<iostream>
#include<cstring>
using namespace std;
const int maxn=1e6+10;
int Next[maxn];
int expand[maxn];
char s[maxn];
void GNext(char t[],int le,int Next[])
{
    int a=0,b=0;
    Next[0]=le;
    for(int i=1;i<le;i++)
    {
        if(i>=b||i+Next[i-a]>=b)
        {
            if(i>=b)
               b=i;
            while(b<le&&t[b]==t[b-i])
               b++;
            Next[i]=b-i;
            a=i; 
        }
        else
           Next[i]=Next[i-a];
    }
}
void Gexpand(char s[],char t[],int expand[],int Next[])
{
    int a=0,b=0;
    int len1=strlen(s);
    int len2=strlen(t);
    GNext(t,len2,Next);
    for(int i=0;i<len1;i++)
    {
        if(i>=b||i+Next[i-a]>=b)
        {
            if(i>=b)  b=i;
            while(b<len1&&b-i<len2&&s[b]==t[b-i])
               b++;
            expand[i]=b-i;
            a=i;
        }
        else
           expand[i]=Next[i-a];
    }
}
int main()
{
    int t;
    scanf("%d",&t);
    while(t--)
    {
        scanf("%s",s);
        Gexpand(s+1,s,expand,Next);
        int len=strlen(s);
        len--;
        long long ans=0;
        for(int i=0;i<len;i++)
        {
            if(i+expand[i]<len)
               ans+=expand[i]+1;
            else
              ans+=expand[i];
        }
        printf("%lld
",ans);
    }
    return 0;
}

 总结:这周除了一些思维题,就是和kmp相关算法的题了,最大最小+kmp,扩展kmp,在学习过程中,也看了一些kmp讲解的视频,也搞懂了next存的前后缀,循环节,稍简单的还能做,但一和其他算法联结,就蒙圈了,看题解,但有些地方还是比较晦涩难懂,手动模拟,看视频,有些部分仍旧不能很好的连接!



 
原文地址:https://www.cnblogs.com/ylrwj/p/11478327.html