作业3

1. 目标

使用开发工具（Eclipse 或者 Visual Studio）
开发语言（C、C++、C# 或者Java ）
使用源代码管理工具（Github）

2. 要求

(1). 实现一个控制台程序，给定一段英文字符串，统计其中各个英文单词（4字符以上含4字符）的出现频率。

输出要求：按照频率降序输出单词及频率，相同频率则按字母顺序输出。如下所示：

file: 3
word: 2
case: 1
considered: 1
insensitive: 1
same: 1

附加要求：读入一段文本文件，统计该文本文件中单词的频率，并上面要求输出统计结果。

(2). 性能分析：

对C++代码运行VS的性能分析工具，找出性能问题并进行优化。
对Java程序运行性能分析工具 NetBeans IDE 6.0，找出性能问题并进行优化。

3. 提交内容包括：

(1).github: 代码签入github

确保结果正确
不要有警告
良好的性能
确保单元测试代码通过

(2).博客：提交博客

在开始项目前，你预估各个功能模块要花的时间
在项目完成后，你实际在各个功能模块花的时间
描述你花了多少时间在提高程序性能上，利用VS的性能分析工具，展示你的性能图表
展示你的单元测试结果，说明你是如何确保单元测试结果正确的
你在这次练习中学到了什么
在博客上写“你这个程序最得意、最独特的设计是哪里？你是怎么想到的？最大的bug 在哪里？请贴部分代码并解释”（ZX_Proposal）

4. 作业提示

(1). 定义

字母: A-Z, a-z.
字母数字: A-Z, a-z, 0-9.
分隔符: 非字母数字
单词:
包含有4个或4个以上的字母
单词由分隔符分开
如果一个字符串包含_非_字母数字，则不是单词
单词大小写不敏感，例如 “file”、“FILE”和“File”可以看作同一个单词
单词必须是字母开头，“file123”是单词，“123file”不是单词

(2). 示例

输入

Word is case insensitive, i.e. “file”, “FILE” and “File” are considered the same word.

输出

file: 3
word: 2
case: 1
considered: 1
insensitive: 1
same: 1

(3). 参考资源

教材2.2 节 效能分析工具 代码清单2-6 、代码清单2-7；
还可以参考下列链接：
- 词频统计（心得）
- 词频统计工程的准备工作和实际完成情况
Java 正则表达式介绍链接
C++ 正则表达式介绍链接

5.代码

#include <iostream> using std::cout; using std::endl;

//函数: 查找子串sub在str中出现的次数

int fun(const std::string& str, const std::string& sub)

{

int num = 0;

for (size_t i=0; (i=str.find(sub,i)) != std::string::npos; num++, i++);

return num;

}

void main()

{

std::string str("Many of my classmates have a computer, I have one too. My father bought it for me as a present when my first year in middle school. He said I can study English with computer. Most of the time, I use computer to search study materials on the internet. I also have some foreign friends on the internet, we can talk in English. Sometimes I play video game with computer after I finish my homework. My computer helps me a lot, it is a good friend to me.");

std::string sub("computer");

cout<<fun(str,sub)<<endl;

}

6.结果

7.总结

我一开始想用KMP匹配来做的，后来没调试好，而且计数器也加不进去就放弃了，最后就用了这个蠢办法<<**@**>>