作业3：个人项目-词频统计

1.要求：实现一个控制台程序，给定一段英文字符串，统计其中各个英文单词的出现频率。

2.性能分析：

对C++代码运行VS的性能分析工具，找出性能问题并进行优化。
对Java程序运行性能分析工具 NetBeans IDE 6.0，找出性能问题并进行优化。

import java.util.ArrayList; 
import java.util.Collections; 
import java.util.Comparator; 
import java.util.HashMap; 
import java.util.List; 
import java.util.Map; 
import java.util.StringTokenizer; 
import java.util.Map.Entry; 
import java.util.Scanner;
   
   
   
public class 修改 {
    public static void main(String arg[]) {
        Map<String, Integer> map=
        //用于统计各单词的个数
        new HashMap<String, Integer>();
        String sentence="Word is case insensitive, i.e. “file”, “FILE”"
                         + "and “File” are considered the same word."; 
        //大写字母转化为小写
        sentence=sentence.toLowerCase();                     
        //将字符串分解成一个个的标记 
        StringTokenizer token=new StringTokenizer(sentence);  
        while (token.hasMoreTokens()) {
            //单词用这些分隔符 分开
            String word=token.nextToken(", “”?.!:""''
"); 
            int count;   
            //HashMap不允许重复的key，用这个特性，去统计单词的个数
            if (word.length()>=4) {
              if (map.containsKey(word)) {                        
                count=map.get(word); 
                //如果已有这个单词则设置它的数量加1
                map.put(word, count + 1);                       
              } else {
                //如果没有这个单词则新填入数量为1 
                map.put(word, 1);
              }
            }                                              
        } 
            //调用函数并输出      
            sort(map);                                           
    } 
 
public static void sort(Map<String, Integer> map) {
    List<Map.Entry<String, Integer>> infoIds =
    new ArrayList<Map.Entry<String, Integer>>(map.entrySet());  
    for ( int i = 0; i < infoIds.size(); i++) {
        Entry<String, Integer> id = infoIds.get(i);
        System.out.println(id.getKey() + ":" + id.getValue()); 
    } 
     
 
}
}

下面是测试结果：

file:3
word:2
case:1
same:1
considered:1
insensitive:1

　　当输入的英文字符串为 Beware,beware!he'll cheat'ithout scruple,who can without fear.输出为：

scruple:1
cheat:1
beware:2
without:1
ithout:1
fear:1

　总结：主要利用哈希函数的特性来统计单词的个数，

      toLowerCase来确保不分大小写 ，

      token.nextToken()来分离出英语单词。

      整体来说程序还是易于操作的。

github链接：https://github.com/Yizhongmeng/Mengzhongyi

性能分析工具下载显示电脑未安装jdk。。。。。

→_→ →_→ →_→