第 2 章第 1 题同位词问题下问 Multimap实现

问题分析

　　输入：一个任意的单词和一个内含多个乱序单词的字典文件

　　输出：该单词在字典中的所有同位词

　　约束：允许事先对字典进行预处理

解决思路

　　上问的程序有个缺点 - 我们必须遍历完整个字典文件才能输出所有结果。现在下问允许我们事先对字典文件进行预处理，那么可以先对字典文件的单词按其标识符排序，这样相同标识符的单词都聚集在了一起，从而避免了对整个文件的检索。下面的代码用C++中的关联容器Multimap实现了这个思想。

代码实现

 1 #include <iostream>
 2 #include <fstream>
 3 #include <map>
 4 #include <string>
 5 
 6 using namespace std;
 7 
 8 #define MAX 26
 9 
10 /*
11  * 获取单词word的标识符并返回
12 */
13 string getID(string word)
14 {
15     string id(26, '0');
16     for (string::size_type i=0; i<word.length(); i++) {
17         id[word[i]-97]++;
18     }
19 
20     return id;
21 }
22 
23 int main()
24 {
25     /*
26      * 打开字典文件
27     */
28     string filename;
29     cout << "请输入字典文件名( 当前目录下 ): ";
30     cin >> filename;
31 
32     fstream io;
33     io.open(filename.c_str());
34     if (!io) {
35         cout << "打开文件失败" << endl;
36         return 1;
37     }
38 
39     /*
40      * 获取查询单词及其标识符
41     */
42     string word;
43     cout << "请输入查询单词: ";
44     cin >> word;
45     string wordID = getID(word);
46 
47     /*
48      * 将字典文件存放进关联容器
49     */
50     multimap<string, string> m;
51     string first, second;
52     while (io >> second) {
53         first = getID(second);
54         m.insert(make_pair(first, second));
55     }
56     io.close();
57 
58     /*
59      * 检索关联容器并打印检索结果
60     */
61     multimap<string, string> :: iterator it1, it2;
62     it1 = m.lower_bound(wordID);
63     it2 = m.upper_bound(wordID);
64     while (it1->first != it2->first) {
65         cout << it1->second << endl;
66         it1++;
67     }
68 
69     // 关闭文件指针
70     io.close();
71     
72     return 0;
73 }

运行测试

测试所用字典文件：

运行结果：

说明

　　当字典文件中单词数量达到千万级别的时候，程序运行异常（很占CPU和内存且耗时巨大，而上问用的程序依然运行良好）。难道multimap容器不适合处理大批量的数据？原因仍在思考中读者若有思路欢迎与我联系... ...

第 2 章 第 1 题 同位词问题 下问 Multimap实现

第 2 章第 1 题同位词问题下问 Multimap实现