HDU--2222--Keywords Search--AC自己主动机

Keywords Search

Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 131072/131072 K (Java/Others)
Total Submission(s): 44594    Accepted Submission(s): 14056


Problem Description
In the modern time, Search engine came into the life of everybody like Google, Baidu, etc.
Wiskey also wants to bring this feature to his image retrieval system.
Every image have a long description, when users type some keywords to find the image, the system will match the keywords with description of image and show the image which the most keywords be matched.
To simplify the problem, giving you a description of image, and some keywords, you should tell me how many keywords will be match.
 


 

Input
First line will contain one integer means how many cases will follow by.
Each case will contain two integers N means the number of keywords and N keywords follow. (N <= 10000)
Each keyword will only contains characters 'a'-'z', and the length will be not longer than 50.
The last line is the description, and the length will be not longer than 1000000.
 


 

Output
Print how many keywords are contained in the description.
 


 

Sample Input
1 5 she he say shr her yasherhs
 


 

Sample Output
3
 

题意:给定N个字符串,然后是一个文章。问你在文章中有多少个字符串是出现了的,不计算反复

ps:AC自己主动机,今天才学的。若有所感,非常easy,真的。在博客中发表了这个算法的学习。求共同进步。

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace
std;

struct
node
{

    node *fail,*next[26];//fail:指向同级节点。当匹配失败时跳转。含义为x后缀包括的x同级节点整个串
    int
x;
    node
()//用来初始化
    {

        fail=NULL;
        x=0;

        for
(int i=0;i<26;i++)next[i]=NULL;
    }
}*
root;
node *q[500010];

char
str[1000010];
void
setit(char *ss)//这是构建字典树
{

    int
i,j,k,l;
    node *p=root;

    for
(i=0;ss[i];i++)
    {

        k=ss[i]-'a';

        if
(p->next[k]==NULL)//假设下一个字符无节点
        p->next[k]=new node();//创建并连接在p的后面
        p=p->next[k
];
    }

    p->x
++;
}

void
ac()//AC自己主动机算法的构建,跟KMP的next
{

    int
tail,head;
    tail=head=0;
    q[tail++]=root;//广搜每一个节点。这样是依照树的深度一层一层来搜索,fail仅仅会仅仅想当前层以上的点,你懂的。


    while
(head!=tail)
    {

        node *p=q[head++];//取出当前点

        for
(int i=0;i<26;i++)//遍历26个字符,这里用ascll码表示
        if
(p->next[i]!=NULL)//假设存在
        {

            if
(p==root)//假设存在于根节点以下也就是第二层
            {

                p->next[i]->fail=root;//默认指向根节点,由于第一个字符都匹配错误,那可定从头開始啊
                q[tail++]=p->next[i];//入队

                continue
;
            }

            node *cur=p->fail;//取出查询字符的上一层节点。用cur找它的同级节点

            while
(cur!=NULL)//假设是空就结束,由于root的fail我初始化设定是NULL
            {

                if
(cur->next[i]!=NULL)//假设找到某个同级节点而且它后面有当前字符
                {

                    p->next[i]->fail=cur->next[i];//把当前字符的fail指针指向同级节点后面的那个字符的位置

                    break
;
                }

                cur=cur->fail
;
            }
//cout<<(p->next[i]==NULL?1:0)<<endl;
            if(cur==NULL)//假设没有找到合适的匹配
            p->next[i]->fail=root;//那么当前字符的fail就指向root根节点
            q[tail++]=p->next[i
];//入队
        }
    }
}

int
query()//在AC自己主动机算法处理后的字典树中对字符串进行匹配
{

    int
i,j,k,l,sum=0,cur;
    node *p=root;

    for
(i=0;str[i];i++)//遍历每个字符
    {

        cur=str[i]-'a';//取出字符

        while
(p->next[cur]==NULL&&p!=root)//假设节点的后面没有当前字符就继续找同级。直到根节点
        p=p->fail;

        if
(p->next[cur]!=NULL)p=p->next[cur];//跳向当前字符所在节点。假设没有找到合适,那么p=root。所以推断cur下一层是否有当前字符存在。不存在就继续让p指向root
        node *q=p;//用q替代p来进行操作

        while
(q!=root&&q->x!=-1)//查询全部同级节点,直到root或者已经被查询过
        {

            sum+=q->x;//把个数加起来
            q->x=-1;//查询过了就设置断点
            q=q->fail
;
        }
    }

    return
sum;
}

int
main (void)
{

    int
t,n,m,i,j,k,l;
    scanf("%d",&t);

    while
(t--&&scanf("%d",&n))
    {

        root=new node();

        for
(i=0;i<n;i++)
        {

            char
ss[55];
            scanf("%s",ss);
            setit(ss
);
        }

        ac();
//cout<<"A"<<endl;
        scanf("%s",str);
        printf("%d ",query
());
    }

    return
0;
}

原文地址:https://www.cnblogs.com/yfceshi/p/7103187.html