Oulipo (poj3461

Oulipo
Time Limit: 1000MS   Memory Limit: 65536K
Total Submissions: 29759   Accepted: 11986

Description

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A''B''C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

  • One line with the word W, a string over {'A''B''C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
  • One line with the text T, a string over {'A''B''C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
3
0

Source

题意:两个字符串,在第二个字符串中可以找到多少个含有第一个字符串的区间……
kmp能干嘛,next是干嘛的。
按照正常思维一个个查的话,效率太低,于是伟大的前辈们就创造出了针对如何快速查找字符串的高大上的kmp。
 
理解:    因为每次要查找的时候都会对字符串从头开始查找,但是如果我们已经知道在查找字符串当前位置的时候,有部分前缀和后缀是相等的,但是当前位置又不相等,
那下次查找的时候就不用从头开始找啦,因为前缀和后缀相等,所以前缀和后缀相等的长度就不用查询了,就从已经查找到的位置减去前缀和后缀相等的长度开始查询就行了。
那么问题又转化成了怎么求字符串当前位置前缀和后缀相等的长度。用数组next存储,next[j] = k,意味着:p[0, k-1] = p[j-k, j-1].
 
 
因此KMP算法的关键在于求算next[]数组的值,即求算模式串每个位置处的最长后缀与前缀相同的长度, 而求算next[]数组的值有两种思路,第一种思路是用递推的思想去求算,还有一种就是直接去求解。

1.按照递推的思想:

   根据定义next[0]=-1,假设next[j]=k, 即P[0...k-1]==P[j-k,j-1]

   1)若P[j]==P[k],则有P[0..k]==P[j-k,j],很显然,next[j+1]=next[j]+1=k+1;

   2)若P[j]!=P[k],则可以把其看做模式匹配的问题,即匹配失败的时候,k值如何移动,显然k=next[k]。

   因此可以这样去实现:

 1 void getNext(char *p,int *next)
 2 {
 3     int j,k;
 4     next[0]=-1;
 5     j=0;
 6     k=-1;
 7     while(j<strlen(p)-1)
 8     {
 9         if(k==-1||p[j]==p[k])    //匹配的情况下,p[j]==p[k]
10         {
11             j++;
12             k++;
13             next[j]=k;
14         }
15         else                   //p[j]!=p[k]
16             k=next[k];
17     }
18 }
View Code

问题: next应该到数组长度……

优化一点:

 1 void getNext(char *p,int *next)
 2 {
 3     int j,k;
 4     int lastIndex=strlen(p)-1;
 5     next[0]=-1;
 6     j=0;
 7     k=-1;
 8     while(j < lastIndex)
 9     {
10         if(k==-1||p[j]==p[k])    //匹配的情况下,p[j]==p[k]
11         {
12             ++j;
13             ++k;
14              
15             //若p[j]==p[k],则需要修正
16             if(p[j]==p[k]) 
17                 next[j]=next[k];
18             else
19                 next[j]=k;
20         }
21         else                   //p[j]!=p[k]
22             k=next[k];
23     }
24 }
View Code

2.直接求解方法

 1 void getNext(char *p,int *next)
 2 {
 3     int i,j,temp;
 4     for(i=0;i<strlen(p);i++)
 5     {
 6         if(i==0)
 7         {
 8             next[i]=-1;     //next[0]=-1
 9         }
10         else if(i==1) 
11         {
12             next[i]=0;      //next[1]=0
13         }
14         else
15         {
16             temp=i-1;
17             for(j=temp;j>0;j--)
18             {
19                 if(equals(p,i,j))
20                 {
21                     next[i]=j;   //找到最大的k值
22                     break;
23                 }
24             }
25             if(j==0)
26                 next[i]=0;
27         }
28     }
29 }
30 
31 bool equals(char *p,int i,int j)     //判断p[0...j-1]与p[i-j...i-1]是否相等  
32 {
33     int k=0;
34     int s=i-j;
35     for(;k<=j-1&&s<=i-1;k++,s++)
36     {
37         if(p[k]!=p[s])
38             return false;
39     }
40     return true;
41 }
View Code

本题代码:

 1 #include<iostream>
 2 #include<cstdio>
 3 #include<cstring>
 4 
 5 using namespace std;
 6 
 7 char w[10008], t[1000008];
 8 int next[10008];
 9 
10 void getNext()
11 {
12     int j, k;
13     int i = strlen(w);
14     j = 0;
15     k = -1;
16     next[0] = -1;
17 
18     while(j < i)
19     {
20         if(k == -1 || w[j] == w[k])
21         {
22             j++;
23             k++;
24             next[j] = k;
25         }
26         else
27             k = next[k];
28     }
29 }
30 
31 int main()
32 {
33     int c;
34     scanf("%d", &c);
35     while(c--)
36     {
37         memset(next, 0, sizeof(next));
38         scanf("%s", w);
39         scanf("%s", t);
40         getNext();
41         int i, j, l1, l2;
42         l1 = strlen(w);
43         l2 = strlen(t);
44         i = j = 0;
45         int sum = 0;
46         while(i < l1 && j < l2)
47         {
48             if(w[i] == t[j] || i == -1)
49             {
50                 i++;
51                 j++;
52             }
53             else
54                 i = next[i];            
55 
56             if(i == l1)
57             {
58                 sum++;
59                 i = next[i];
60             }
61         }
62         printf("%d
", sum);
63     }
64     return 0;
65 }
原文地址:https://www.cnblogs.com/Tinamei/p/4798918.html