小程序：统计Java源代码中的关键字

题目来自《Java语言程序设计(8E)》

　22.3**(Counting the keywords in Java source code) Write a program that reads a Java
source-code file and reports the number of keywords (including null, true, and
false) in the file. Pass the Java file name from the command line.
(Hint: Create a set to store all the Java keywords.)

1.创建包含java关键字的散列集

　　Set<String> keyTable = new HashSet<String>();

将50+3(null,true,false)个关键字添加到表中

2.从文件中读入

开始时用FileReader时会报错

FileReader

1         File file = new File("E:\\Files\\WorkStation\\chapter22\\src\\Sample\\SetListPerformedTest.java");
2         BufferedReader br = null;; 
3         br = new BufferedReader(new FileReader(file));
4         br.close();

在网上查了下，对于这类FileReader(File file)，参数表示的文件路径尽管存在，但如果表示一个目录，那么就会抛出FileNotFound异常。而此异常不属于RunTimeException或Error，而是必检异常。所以不管参数表示的文件路径是否存在，Eclipse都会强制要求处理此异常。

修改后的代码如下

ModifiedFileReader

 1         File file = new File("E:\\Files\\WorkStation\\chapter22\\src\\Sample\\SetListPerformedTest.java");
 2         BufferedReader br = null;; 
 3         try 
 4         {
 5             br = new BufferedReader(new FileReader(file));
 6                         br.close();
 7         } catch (FileNotFoundException e) {
 8             // TODO Auto-generated catch block
 9             e.printStackTrace();
10         }

3.根据给定正则表达式的匹配拆分文本内的字符串

正则表达式是指一个用来描述或者匹配一系列符合某个句法规则的字符串的单个字符串。教程

我们可以用正则表达式[^a-zA-Z]表示除了xyz外的任意一个字符，添加到stringname.split()的参数中就可以把此字符串内的单词一个一个拆开。最后把单词一个个存入到ArrayList(命名为words)中，就完成了文本字符串的单词捕获。

代码如下

CatchWords

 1             String tempString = null;
 2             while ((tempString = br.readLine()) != null)
 3             {
 4                 String[] wordLine = tempString.split("[^a-zA-Z]");
 5                 
 6                 for(int i = 0 ; i < wordLine.length; i++)
 7                 {
 8                         words.add(wordLine[i]);
 9                 }
10             }

但当我们检索单词时发现有许多字符串为""，比如在我用的这个例子中，words[4]="util"，words[5]=""，words[6]="public"。

而原文本为：

import java.util.*;

public

可以看出中间有个回车，而后面还有许多空字符串并不仅仅是因为回车。那么我们怎么才能得到真正的单词表呢？其实我们忘记了字符串类的一个方法length(),所以我们可以很方便地用它来排除掉空字符串

if(wordsLine[i] > 0);

这些words里就存入了满满的单词。

4.检测关键词

经过前面的准备，程序的重点总算来了，当然现在实现并不困难。开始我想到的是拿文本单词表一个个与关键词表比较，后面发现反过来其实更节省时间。因为文本中很可能有重复的关键词，而我们只有需要关键词与文本内一个单词相匹配，就可以跳出内循环。

代码如下

CountKeyword

 1         int count = 0;
 2         for(String key: keyTable)
 3         {
 4             for(String w: words)
 5             {
 6                 if(key.equals(w))
 7                 {
 8                     count++;
 9                     break;
10                 }
11             }
12         }

5.总结

其实这个题目思路很简单，我当时在提取单词卡住了，知道正则表达式之后就柳暗花明了。