[LeetCode#208]Implement Trie (Prefix Tree)

Problem:

Implement a trie with insertsearch, and startsWith methods.

Note:
You may assume that all inputs are consist of lowercase letters a-z.

Analysis:

This data structure is super useful!!!
It could greatly improve the efficiency in solving some problem!!!

Big picture:
Each TrieNode has an item and a group of children. For this problem, children's size is 26 ('a' - 'z'). 
If children[i] != null, it indicates character: 'a'+i just appear at this position. <The position means the level of this node, the same as the character's index in a String>
Reference:
https://en.wikipedia.org/wiki/Trie

We construct a tree based on the inserted word. The root node could be used to represent the first character in the String.
Note: when we search a character, we could directly use children[c-'a'] to locate the information for it, which brings great convenience.  
Key: except the final word stored in item, we actually record no character, we just map chidlren[c-'a'] with character c. The way we test if a character exist at this index(level) is to check if children[c-'a'] == null. 

1. insert a word. 
public void insert(String word) {
    TrieNode node = root;
    for (char c : word.toCharArray()) {
        if (node.children[c-'a'] == null)
            node.children[c-'a'] = new TrieNode();
        node = node.children[c-'a'];
        }
    node.item = word;
}

Key part: 
1.1 we start from the first index
TrieNode node = root; <which maps to the first character in the word>
1.2 
Iff the character at the position has already appeared for other words, we directly use this child for future extension. Other wise, as the first such character(at the index), we make a new TrieNode.
if (node.children[c-'a'] == null)
    node.children[c-'a'] = new TrieNode();
Important: before begining the next loop, we have already point to the TrieNode(at that index). Thus you must update it
node = node.children[c-'a'];
Once we reached the right TrieNode, we must indicate we have word at here.
node.item = word;



2. search a word.
public boolean search(String word) {
    TrieNode node = root;
    for (char c : word.toCharArray()) {
        if (node.children[c-'a'] == null)   
            return false;
        node = node.children[c-'a'];
    }
    return node.item.equals(word);
}
We follow the idea of inserting a word. Once we could not find a character(at certain index) for a certain TrieNode, we return false.
Note: All indexed characters appear rightly in the Trie does not necessarily mean the serached the word has appeared in the Trie.
Case:
iff 'abcd' in the trie, all characters of 'abc' also in the tree.
To avoid this situation, we must compare the word we have recorded at the Trie Node.



3. serach prefix.
This is a very important characteristic of TrieNode.
It is just part of searching a word in the Trie.
public boolean startsWith(String prefix) {
        TrieNode node = root;
        for (char c : prefix.toCharArray()) {
            if (node.children[c-'a'] == null)
                return false;
            node = node.children[c-'a'];
        }
        return true;
}

Efficiency:
For a Trie, 
the cost of searching a word or preix is O(length(word or prefix)). 
the cost to insert a word is O(length(word)).

However, it could be very space efficiency.
All word share the same prefix, would share the same Trie Node for those prefix. 

The amazing part of this data structure is prefix search, which could greatly save enery in DFS search. 

Solution:

class TrieNode {
    TrieNode[] children;
    String item; 
    // Initialize your data structure here.
    public TrieNode() {
        children = new TrieNode[26];
        item = "";
    }
}

public class Trie {
    private TrieNode root;

    public Trie() {
        root = new TrieNode();
    }

    // Inserts a word into the trie.
    public void insert(String word) {
        TrieNode node = root;
        for (char c : word.toCharArray()) {
            if (node.children[c-'a'] == null)
                node.children[c-'a'] = new TrieNode();
            node = node.children[c-'a'];
        }
        node.item = word;
    }

    // Returns if the word is in the trie.
    public boolean search(String word) {
        TrieNode node = root;
        for (char c : word.toCharArray()) {
            if (node.children[c-'a'] == null)   
                return false;
            node = node.children[c-'a'];
        }
        return node.item.equals(word);
    }

    // Returns if there is any word in the trie
    // that starts with the given prefix.
    public boolean startsWith(String prefix) {
        TrieNode node = root;
        for (char c : prefix.toCharArray()) {
            if (node.children[c-'a'] == null)
                return false;
            node = node.children[c-'a'];
        }
        return true;
    }
}

 Fatal errors and Improvement:

To come up with the above solution, I have made a fatal error, which could be quite common if you have poor programming skill and habits. 
A fatal: Change the state of visited board, and make a middle return before recovering the state into right mode.
--------------------------------------------
        visited[i][j] = true;
        cur_str = cur_str + board[i][j];
        if (!prefix_tree.startsWith(cur_str))
            return; //directly return!!!
        ...
        visited[i][j] = false;
--------------------------------------------
Fix method is easy, but still ugly
if (!prefix_tree.startsWith(cur_str)) {
    visited[i][j] = false;
    return;
}

The reason for such mistake is the bad programming organization. We should avoid any return in the middle a change state, that means we should sequeeze "the change states" as minimum as possible. 
----------------------------------------------------------------------------------------
        if (i < 0 || j < 0 || i >= board.length || j >= board[0].length || visited[i][j])
            return;
        cur_str = cur_str + board[i][j];
        if (!prefix_tree.startsWith(cur_str))
            return;
        if (prefix_tree.search(cur_str))
            ret.add(cur_str);
        visited[i][j] = true;
        searchWord(board, i-1, j, cur_str, visited, prefix_tree, ret);
        searchWord(board, i+1, j, cur_str, visited, prefix_tree, ret);
        searchWord(board, i, j-1, cur_str, visited, prefix_tree, ret);
        searchWord(board, i, j+1, cur_str, visited, prefix_tree, ret);
        visited[i][j] = false;
----------------------------------------------------------------------------------------
Why not change the state, when we need to fork? (when the children branch need to know it!)

What's more, the solution is still no elegant enough!
If we don't need duplicates as return, we should try to use a hash set initally, then add all its elements into ArrayList.
The idea from:
        Set<String> hs = new HashSet<>();
        hs.addAll(ret);
        ret.clear();
        ret.addAll(hs);
        return ret;
In our code:
Set<String> ret = new HashSet<String> ();
...
if (prefix_tree.search(cur_str))
    ret.add(cur_str);
...
return new ArrayList<String> (ret);

Solution 2.2

public class Solution {
    public List<String> findWords(char[][] board, String[] words) {
        if (board == null || words == null)
            throw new IllegalArgumentException("Plese check the arguments passed into the funtion");
        Set<String> ret = new HashSet<String> ();
        Trie prefix_tree = new Trie();
        if (board.length == 0)
            return new ArrayList<String> (ret);
        boolean[][] visited = new boolean[board.length][board[0].length];
        for (int i = 0; i < words.length; i++)
            prefix_tree.insert(words[i]);
        for (int i = 0; i < board.length; i++) {
            for (int j = 0; j < board[0].length; j++) {
                searchWord(board, i, j, "", visited, prefix_tree, ret);
            }
        }
        return new ArrayList<String> (ret);
    }
    
    
    private void searchWord(char[][] board, int i, int j, String cur_str, boolean[][] visited, Trie prefix_tree, Set<String> ret) {
        if (i < 0 || j < 0 || i >= board.length || j >= board[0].length || visited[i][j])
            return;
        cur_str = cur_str + board[i][j];
        if (!prefix_tree.startsWith(cur_str))
            return;
        if (prefix_tree.search(cur_str))
            ret.add(cur_str);
        visited[i][j] = true;
        searchWord(board, i-1, j, cur_str, visited, prefix_tree, ret);
        searchWord(board, i+1, j, cur_str, visited, prefix_tree, ret);
        searchWord(board, i, j-1, cur_str, visited, prefix_tree, ret);
        searchWord(board, i, j+1, cur_str, visited, prefix_tree, ret);
        visited[i][j] = false;
    }
}

//Trie Node
class TrieNode{
    public TrieNode[] children = new TrieNode[26];
    public String item = "";
}

//Trie
class Trie{
    public TrieNode root = new TrieNode();
 
    public void insert(String word){
        TrieNode node = root;
        for(char c: word.toCharArray()){
            if(node.children[c-'a']==null){
                node.children[c-'a']= new TrieNode();
            }
            node = node.children[c-'a'];
        }
        node.item = word;
    }
 
    public boolean search(String word){
        TrieNode node = root;
        for(char c: word.toCharArray()){
            if(node.children[c-'a']==null)
                return false;
            node = node.children[c-'a'];
        }
        if(node.item.equals(word)){
            return true;
        }else{
            return false;
        }
    }
 
    public boolean startsWith(String prefix){
        TrieNode node = root;
        for(char c: prefix.toCharArray()){
            if(node.children[c-'a']==null)
                return false;
            node = node.children[c-'a'];
        }
        return true;
    }
}
原文地址:https://www.cnblogs.com/airwindow/p/4768259.html