208. 实现 Trie (前缀树) 和面试题 17.13. 恢复空格

这里写了两道题，第二道题依赖第一道题，就先分析第一道题

实现一个 Trie (前缀树)，包含 insert, search, 和 startsWith 这三个操作。

示例:

Trie trie = new Trie();

trie.insert("apple");
trie.search("apple"); // 返回 true
trie.search("app"); // 返回 false
trie.startsWith("app"); // 返回 true
trie.insert("app");
trie.search("app"); // 返回 true

解:

Trie (发音为 "try") 或前缀树是一种树数据结构，用于检索字符串数据集中的键。这一高效的数据结构有多种应用：

自动补全，拼写检查等

Trie 树的结点结构
Trie 树是一个有根的树，其结点具有以下字段：。

最多 RR 个指向子结点的链接，其中每个链接对应字母表数据集中的一个字母。
本文中假定 RR 为 26，小写拉丁字母的数量。
布尔字段，以指定节点是对应键的结尾还是只是键前缀。

向 Trie 树中插入键
我们通过搜索 Trie 树来插入一个键。我们从根开始搜索它对应于第一个键字符的链接。有两种情况：

链接存在。沿着链接移动到树的下一个子层。算法继续搜索下一个键字符。
链接不存在。创建一个新的节点，并将它与父节点的链接相连，该链接与当前的键字符相匹配。
重复以上步骤，直到到达键的最后一个字符，然后将当前节点标记为结束节点，算法完成。

复杂度分析

时间复杂度：O(m)O(m)，其中 mm 为键长。在算法的每次迭代中，我们要么检查要么创建一个节点，直到到达键尾。只需要 mm 次操作。

空间复杂度：O(m)O(m)。最坏的情况下，新插入的键和 Trie 树中已有的键没有公共前缀。此时需要添加 mm 个结点，使用 O(m)O(m) 空间。

在 Trie 树中查找键
每个键在 trie 中表示为从根到内部节点或叶的路径。我们用第一个键字符从根开始，。检查当前节点中与键字符对应的链接。有两种情况：

存在链接。我们移动到该链接后面路径中的下一个节点，并继续搜索下一个键字符。
不存在链接。若已无键字符，且当前结点标记为 isEnd，则返回 true。否则有两种可能，均返回 false :
还有键字符剩余，但无法跟随 Trie 树的键路径，找不到键。
没有键字符剩余，但当前结点没有标记为 isEnd。也就是说，待查找键只是Trie树中另一个键的前缀。

复杂度分析

时间复杂度 :
空间复杂度 :

class Trie {
private:
    Trie *next[26];
    bool bIsEnd;
public:
    /** Initialize your data structure here. */
    Trie() {
        memset(next,0,sizeof(next));
        bIsEnd=false;
    }
    
    /** Inserts a word into the trie. */
    void insert(string word) {
        Trie *ptr=this;
        for(char ch : word)
        {
            if(ptr->next[ch-'a']==nullptr)
            {
                ptr->next[ch-'a']=new Trie();
                
            }
            ptr=ptr->next[ch-'a'];
        }
        ptr->bIsEnd=true;

    }
    
    /** Returns if the word is in the trie. */
    bool search(string word) {
        Trie *ptr=this;
        for(auto ch:word)
        {
            if(ptr->next[ch-'a']!=nullptr)
            {
                ptr=ptr->next[ch-'a'];
            }
            else
            {
                
                return false;
                
            }
        }
        return ptr->bIsEnd;
    }
    
    /** Returns if there is any word in the trie that starts with the given prefix. */
    bool startsWith(string prefix) {
                Trie *ptr=this;
        for(auto ch:prefix)
        {
            if(ptr->next[ch-'a']!=nullptr)
            {
                ptr=ptr->next[ch-'a'];
            }
            else
            {
                return false;
                
            }
        }
        return true;
    }
    
};

/**
 * Your Trie object will be instantiated and called as such:
 * Trie* obj = new Trie();
 * obj->insert(word);
 * bool param_2 = obj->search(word);
 * bool param_3 = obj->startsWith(prefix);
 */

第二道题

面试题 17.13. 恢复空格

哦，不！你不小心把一个长篇文章中的空格、标点都删掉了，并且大写也弄成了小写。像句子"I reset the computer. It still didn’t boot!"已经变成了"iresetthecomputeritstilldidntboot"。在处理标点符号和大小写之前，你得先把它断成词语。当然了，你有一本厚厚的词典dictionary，不过，有些词没在词典里。假设文章用sentence表示，设计一个算法，把文章断开，要求未识别的字符最少，返回未识别的字符数。

注意：本题相对原题稍作改动，只需返回未识别的字符数

示例：

输入：
dictionary = ["looked","just","like","her","brother"]
sentence = "jesslookedjustliketimherbrother"
输出： 7
解释： 断句后为"jess looked just like tim her brother"，共7个未识别字符。

题目分析
题意：给定字符串，尽可能多地匹配字典内的单词，即最少未匹配数。

分：拿到题目，先想贪心能不能做，显然不行。比如给定字符串：abcdef，字典：["abcd", "ab", "def"]，贪心匹配是 abcd ef，有2个字符未匹配。显然有更优匹配策略： ab c def ，只有 1 个字符未匹配。所以这题需要用动态规划来解决（怎么思考一个题目可不可以用 dp 来解决，先假设 dp 表达含义，再想转移方程，很多时候自然而然就出来了）。

1、状态定义：
dp[i] 表示字符串的前 i 个字符的最少未匹配数。

2、状态转移：
假设当前我们已经考虑完了前 i 个字符了，对于前 i + 1 个字符对应的最少未匹配数：

第 i + 1 个字符未匹配，则 dp[i + 1] = dp[i] + 1，即不匹配数加 1;
遍历前 i 个字符，若以其中某一个下标 idx 为开头、以第 i + 1 个字符为结尾的字符串正好在词典里，则 dp[i] = min(dp[i], dp[idx]) 更新 dp[i]。
于是，有了解法一。

时间复杂度是 $O(n^2)，n 为待匹配字符串的长度。$

class Solution {
    public int respace(String[] dictionary, String sentence) {
        Set<String> dict = new HashSet<>(Arrays.asList(dictionary));
        int n = sentence.length();
        int[] dp = new int[n + 1];
        for (int i = 1; i <= n; i++) {
            dp[i] = dp[i - 1] + 1;
            for (int idx = 0; idx < i; idx++) {
                if (dict.contains(sentence.substring(idx, i))) {
                    dp[i] = Math.min(dp[i], dp[idx]);
                }
            }
        }
        return dp[n];
    }
}

使用 Trie 进行优化：
对于上述解法，计算 dp[i + 1]时，我们需要用 idx 来遍历前 i 个字符，逐个判断以 idx 为开头，以第 i + 1 个字符为结尾的字符串是否在字典里。
这一步可以利用字典树来加速，通过字典树我们可以查询以第 i + 1 个字符为结尾的单词有哪些（构建字典树时将单词逆序插入即可）。
关于字典树的学习，可以看下这篇题解，这就是解法二。
时间复杂度是 O(m + n^2)，m 是字典长度，n为待匹配字符串的长度。为什么还是 n^2
呢？因为有可能状态转移的时候，每个位置都需要转移，这是最坏情况，绝大多数情况下远小于 n，所以解法二最终耗时会远小于解法一。

class Solution {
    public int respace(String[] dictionary, String sentence) {
        // 构建字典树
        Trie trie = new Trie();
        for (String word: dictionary) {
            trie.insert(word);
        }
        // 状态转移，dp[i] 表示字符串的前 i 个字符的最少未匹配数
        int n = sentence.length();
        int[] dp = new int[n + 1];
        for (int i = 1; i <= n; i++) {
            dp[i] = dp[i - 1] + 1;
            for (int idx: trie.search(sentence, i - 1)) {
                dp[i] = Math.min(dp[i], dp[idx]);
            }
        }
        return dp[n];
    }
}

class Trie {
    TrieNode root;

    public Trie() {
        root = new TrieNode();
    }

    // 将单词倒序插入字典树
    public void insert(String word) {
        TrieNode cur = root;
        for (int i = word.length() - 1; i >= 0; i--) {
            int c = word.charAt(i) - 'a';
            if (cur.children[c] == null) {
                cur.children[c] = new TrieNode();
            }
            cur = cur.children[c];
        }
        cur.isWord = true;
    }

    // 找到 sentence 中以 endPos 为结尾的单词，返回这些单词的开头下标。
    public List<Integer> search(String sentence, int endPos) {
        List<Integer> indices = new ArrayList<>(); 
        TrieNode cur = root;
        for (int i = endPos; i >= 0; i--) {
            int c = sentence.charAt(i) - 'a';
            if (cur.children[c] == null) {
                break;
            }
            cur = cur.children[c];
            if (cur.isWord) {
                indices.add(i);
            }  
        }
        return indices;
    }
}

class TrieNode {
    boolean isWord;
    TrieNode[] children = new TrieNode[26];

    public TrieNode() {}
}

208. 实现 Trie (前缀树) 和 面试题 17.13. 恢复空格

面试题 17.13. 恢复空格

208. 实现 Trie (前缀树) 和面试题 17.13. 恢复空格