字典树简单知识及类实现

什么是trie树？

◇ trie树是一种用于高速检索的多叉树结构。

◇ 和二叉查找树不同。在trie树中，每一个结点上并不是存储一个元素。

◇ trie树把要查找的关键词看作一个字符序列。并依据构成关键词字符的先后顺序构造用于检索的树结构。

◇在trie树上进行检索类似于查阅英语词典。

一棵m度的trie树或者为空，或者由m棵m度的trie树构成。

比如，电子英文词典。为了方便用户高速检索英语单词，能够建立一棵trie树。

比如词典由以下的单词构成：a、b、c、aa、ab、ac、ba、ca、aba、abc、baa、bab、bac、cab、abba、baba、caba、abaca、caaba

在trie树上进行查找

比如在上面的trie树中查找单词 aba
(1)在trie树上进行检索总是始于根结点。
(2)取得要查找关键词的第一个字母（比如 a ）。并依据该字母选择相应的子树并转到该子树继续进行检索。
(3)在相应的子树上，取得要查找关键词的第二个字母（比如 b），并进一步选择相应的子树进行检索。

(4) ...
(5)在某个结点处，关键词的全部字母已被取出，则读取附在该结点上的信息，即完毕查找。

有关Trie树：

在trie树中查找一个keyword的时间和树中包括的结点数无关，而取决于组成keyword的字符数。而二叉查找树的查找时间和树中的结点数有关O(log2n)。

假设要查找的keyword能够分解成字符序列且不是非常长，利用trie树查找速度优于二叉查找树。如：
若keyword长度最大是5。则利用trie树，利用5次比較能够从265＝11881376个可能的keyword中检索出指定的keyword。而利用二叉查找树至少要进行log2265=23.5次比較。

Trie的类实现及解析：

#include <iostream>
#include <cstdio>
#include <cstring>
#include <cstdlib>
#define MAXN 10010
#define RST(N)memset(N, 0, sizeof(N))
using namespace std;

const int char_num = 26;

class Trie {
public:
    Trie();
    int trie_search(const char* word, char* entry) const;
    int insert(const char* word, char* entry);
    int remove(const char* word, char* entry);
protected:
    struct Trie_node {
        char *data;
        Trie_node *branch[char_num];  //指向各個子樹的指針
        Trie_node();
    };
    Trie_node *root;
};

Trie::Trie() : root(NULL) {};

Trie::Trie_node::Trie_node()
{
    data = NULL;
    for(int i=0; i<char_num; i++) branch[i]=NULL;
}

int Trie::trie_search(const char *word, char *entry) const  //检索
{
    int position = 0;
    char char_code;
    Trie_node *location = root;
    while(location != NULL && *word != 0) {
        if(*word>='a' && *word<='z') char_code = *word-'a';
        else if(*word>='A' && *word<='Z') char_code = *word-'A';
        else return 0;    //不合法單詞
        location = location->branch[char_code];
        position++, word++;
    }
    if(location != NULL && location->data != NULL) {
        strcpy(entry, location->data);
        return 1;
    }
    return 0;
}

int Trie::insert(const char *word, char *entry)   //插入
{
    int result = 1, position = 0;
    if(root == NULL) root = new Trie_node;
    char char_code;
    Trie_node *location = root;
    while(location != NULL && *word != 0) {
        if(*word>='a' && *word<='z') char_code = *word-'a';
        else if(*word>='A' && *word<='Z') char_code = *word-'A';
        else return 0;   //不合法單詞
        if(location->branch[char_code] == NULL)
            location->branch[char_code] = new Trie_node;
        location = location->branch[char_code];
        position++, word++;
    }
    if(location->data != NULL) result = 0;
    else {
        location->data = new char(strlen(entry)+1);
        strcpy(location->data, entry);
    }
    return result;
}


int main()
{
    Trie t;
    char entry[100];
    t.insert("a", "DET");           t.insert("abacus","NOUN");
    t.insert("abalone","NOUN");     t.insert("abandon","VERB");
    t.insert("abandoned","ADJ");    t.insert("abashed","ADJ");
    t.insert("abate","VERB");       t.insert("this", "PRON");
    if (t.trie_search("this", entry))
        cout<<"'this' was found. pos: "<<entry<<endl;
    if (t.trie_search("abate", entry))
        cout<<"'abate' is found. pos: "<<entry<<endl;
    if (t.trie_search("baby", entry))
        cout<<"'baby' is found. pos: "<<entry<<endl;
    else
        cout<<"'baby' does not exist at all!"<<endl;
    return 0;
}