[LintCode 550.] 最常使用的K个单词II

LintCode 550. 最常使用的K个单词II

难度困难题

题目描述

在实时数据流中找到最常使用的k个单词.
实现TopK类中的三个方法:
TopK(k), 构造方法
add(word), 增加一个新单词
topk(), 得到当前最常使用的k个单词.

样例

样例 1:

输入：
TopK(2)
add("lint")
add("code")
add("code")
topk()
输出：["code", "lint"]
解释：
"code" 出现两次并且 "lint" 出现一次，它们是出现最频繁的两个单词。
样例 2:

输入：
TopK(1)
add("aa")
add("ab")
topk()
输出：["aa"]
解释：
"aa" 和 "ab" 出现 , 但是aa的字典序小于ab。
注意事项
如果两个单词有相同的使用频率, 按字典序排名.

解题思路

topK问题的第一反应是建堆。但是本题还有点特殊，比较贴近实用场景的地方在于：元素的频率是动态更新的。
考虑以下方案：

维护一个hashmap，每次添加元素更新hashmap，每次查询topK的时候建堆。—— 查询开销太大，高频查询效率低。
维护hashmap和堆，每次添加元素同步更新hashmap和堆。—— 堆不控制size的话空间占用太大；控制的话对于TopK(2) add("a") add("b") add("a")这种情况得出的答案是["a", "a"]，因为首先被弹出的是(1,"b")而不是(1,"a")，造成元素重复。
如果有一种堆，拥有一个increaseKey的API可以直接指定更新某一个元素的值就好了，就和斐波那契堆可以直接decreaseKey一样。
堆不行就试试直接把key和value打包成key来做，使用TreeSet，改成对key的操作就行了。

参考代码

struct comparator {
    bool operator () (const pair<int,string>& lhs, const pair<int,string>& rhs) {
        if (lhs.first != rhs.first) return lhs.first > rhs.first;
        return lhs.second < rhs.second;
    }
};

class TopK {
public:
    /*
    * @param k: An integer
    */TopK(int k) {
        // do intialization if necessary
        this->k = k;
    }

    /*
     * @param word: A string
     * @return: nothing
     */
    void add(string &word) {
        // write your code here
        if (k <= 0) return;

        wc[word]++;
        if (wc[word] > 1) st.erase(make_pair(wc[word]-1, word));
        st.insert(make_pair(wc[word], word));
        while(st.size() > k) st.erase(--st.end());
    }

    /*
     * @return: the current top k frequent words.
     */
    vector<string> topk() {
        // write your code here
        if (k <= 0) return {};

        vector<string> res;
        set<pair<int, string>, comparator>::iterator it = st.begin();
        for (int i=0; i<k; i++) {
            if (it == st.end()) break;
            res.push_back((*it).second);
            it++;
        }
        return res;
    }
private:
    unordered_map<string, int> wc;
    // priority_queue<pair<int, string>, vector<pair<int,string>>, comparator> q;
    set<pair<int, string>, comparator> st;
    int k;
};

或者直接使用 set<string>：

unordered_map<string, int> wc;
struct comparator2 {
    bool operator () (const string& lhs, const string& rhs) {
        if (wc[lhs] != wc[rhs]) return wc[lhs] > wc[rhs];
        return lhs < rhs;
    }
};

class TopK {
public:
    /*
    * @param k: An integer
    */TopK(int k) {
        // do intialization if necessary
        this->k = k;
    }

    /*
     * @param word: A string
     * @return: nothing
     */
    void add(string &word) {
        // write your code here
        if (k <= 0) return;
        // wc[word]++;
        if (st.find(word) != st.end()) st.erase(word);
        wc[word]++;
        st.insert(word);
        while(st.size() > k) st.erase(--st.end());
    }

    /*
     * @return: the current top k frequent words.
     */
    vector<string> topk() {
        // write your code here
        if (k <= 0) return {};

        return vector<string>(st.begin(), st.end());
    }
private:
    // unordered_map<string, int> wc;
    // priority_queue<pair<int, string>, vector<pair<int,string>>, comparator> q;
    set<string,comparator2> st;
    int k;
};