UVA 11732 Trie树应用

strcmp() Anyone? Time Limit:2000MS     Memory Limit:0KB     64bit IO Format:%lld & %llu

Submit Status Practice UVA 11732

Appoint description:

Description

J	“strcmp()” Anyone? Input: Standard Input Output: Standard Output

strcmp() is a library function in C/C++ which compares two strings. It takes two strings as input parameter and decides which one is lexicographically larger or smaller: If the first string is greater then it returns a positive value, if the second string is greater it returns a negative value and if two strings are equal it returns a zero. The code that is used to compare two strings in C/C++ library is shown below:

int strcmp(char *s, char *t)
{
    int i;
    for (i=0; s[i]==t[i]; i++)
        if (s[i]=='\0')
            return 0;
    return s[i] - t[i];
}

Figure: The standard strcmp() code provided for this problem.

The number of comparisons required to compare two strings in strcmp() function is never returned by the function. But for this problem you will have to do just that at a larger scale. strcmp() function continues to compare characters in the same position of the two strings until two different characters are found or both strings come to an end. Of course it assumes that last character of a string is a null (‘\0’) character. For example the table below shows what happens when “than” and “that”; “therE” and “the” are compared using strcmp() function. To understand how 7 comparisons are needed in both cases please consult the code block given above.

≠

Returns negative value

7 Comparisons

Returns positive value

7 Comparisons

Input

The input file contains maximum 10 sets of inputs. The description of each set is given below:

Each set starts with an integer N (0<N<4001) which denotes the total number of strings. Each of the next N lines contains one string. Strings contain only alphanumerals (‘0’… ‘9’, ‘A’… ‘Z’, ‘a’… ‘z’) have a maximum length of 1000, and a minimum length of 1.

Input is terminated by a line containing a single zero. Input file size is around 23 MB.

Output

For each set of input produce one line of output. This line contains the serial of output followed by an integer T. This T denotes the total number of comparisons that are required in the strcmp() function if all the strings are compared with one another exactly once. So for N strings the function strcmp() will be called exactly times. You have to calculate total number of comparisons inside the strcmp() function in those calls. You can assume that the value of T will fit safely in a 64-bit signed integer. Please note that the most straightforward solution (Worst Case Complexity O(N² *1000)) will time out for this problem.

Sample Input Output for Sample Input

cat

hat

mat

sir

Case 1: 1

Case 2: 6



边插入边计算、、插入完成、答案就出来了、、
在Trie树上每往下走一层、就计算一次、、这样就不会重复计算
#include <stdio.h>
#include <string.h>
#include <algorithm>
#include <vector>
#include <ctype.h>
#include <stack>
#include <iostream>
#define sigma_size 26
using namespace std;
const int Max =4000*1000+10;
int son[Max];
int bro[Max];
char ch[Max];
int val[Max];
int flag[Max];
int cnt;
long long ans;
void insert(const char *s)
{
    ans+=val[0];
    val[0]++;
    int i,n=strlen(s);
    int u=0;
    int v;
    for(i=0;i<n;i++)
    {
        bool found=false;
        for(v=son[u];v!=0;v=bro[v])
            if(ch[v]==s[i])
             {
                 found=true;
                 break;
             }
        if(!found)
        {
            v=cnt++;
            val[v]=0;
            flag[v]=0;
            ch[v]=s[i];
            bro[v]=son[u];
            son[u]=v;
            son[v]=0;
        }
        u=v;
        ans+=val[u]*2;
        val[u]++;
    }
    if(flag[v])ans+=flag[v];//这个是用于统计相同字符串的个数 开始没有这个一直WA 比如3 个a 那么要比较 6次
    flag[v]++;
    //ans=temp;
}
char str[1010];

int main()
{
   //  freopen("in.txt","r",stdin);
    int n,num=1;
    while(scanf("%d",&n),n)
    {
        ans=0;
        cnt=1;
        val[0]=0;
        son[0]=bro[0]=0;
        while(n--)
        {
            scanf("%s",str);
            insert(str);
        }
        printf("Case %d: %lld\n",num++,ans);
    }
    return 0;
}
//下面是刘汝佳给的参考代码、、工程化比较强、基本思路我是参考他的、还有这种左孩子右兄弟表示法我是第一次用、也借鉴了该代码
//不过我感觉我自己的思路自己更好理解、实现也更简单、虽然速度上只快了一点点
/*
// UVa11732 strcmp() Anyone?
// Rujia Liu
#include<cstring>
#include<vector>
using namespace std;

const int maxnode = 4000 * 1000 + 10;
const int sigma_size = 26;

// 字母表为全体小写字母的Trie
struct Trie {
  int head[maxnode]; // head[i]为第i个结点的左儿子编号
  int next[maxnode]; // next[i]为第i个结点的右兄弟编号
  char ch[maxnode];  // ch[i]为第i个结点上的字符
  int tot[maxnode];  // tot[i]为第i个结点为根的子树包含的叶结点总数
  int sz; // 结点总数
  long long ans; // 答案
  void clear() { sz = 1; tot[0] = head[0] = next[0] = 0; } // 初始时只有一个根结点

  // 插入字符串s（包括最后的'\0'），沿途更新tot
  void insert(const char *s) {
    int u = 0, v, n = strlen(s);
    tot[0]++;
    for(int i = 0; i <= n; i++) {
      // 找字符a[i]
      bool found = false;
      for(v = head[u]; v != 0; v = next[v])
        if(ch[v] == s[i]) { // 找到了
          found = true;
          break;
        }
      if(!found) {
        v = sz++; // 新建结点
        tot[v] = 0;
        ch[v] = s[i];
        next[v] = head[u];
        head[u] = v; // 插入到链表的首部
        head[v] = 0;
      }
      u = v;
      tot[u]++;
    }
  }

  // 统计LCP=u的所有单词两两的比较次数之和
  void dfs(int depth, int u) {
    if(head[u] == 0) // 叶结点
      ans += tot[u] * (tot[u] - 1) * depth;
    else {
      int sum = 0;
      for(int v = head[u]; v != 0; v = next[v])
        sum += tot[v] * (tot[u] - tot[v]); // 子树v中选一个串，其他子树中再选一个
      ans += sum / 2 * (2 * depth + 1); // 除以2是每种选法统计了两次
      for(int v = head[u]; v != 0; v = next[v])
        dfs(depth+1, v);
    }
  }

  // 统计
  long long count() {
    ans = 0;
    dfs(0, 0);
    return ans;
  }
};

#include<cstdio>
const int maxl = 1000 + 10;   // 每个单词最大长度

int n;
char word[maxl];
Trie trie;

int main() {
  int kase = 1;
  while(scanf("%d", &n) == 1 && n) {
    trie.clear();
    for(int i = 0; i < n; i++) {
      scanf("%s", word);
      trie.insert(word);
    }
    printf("Case %d: %lld\n", kase++, trie.count());
  }
  return 0;
}

*/