IO课堂测试

一，用户需求

英语的26 个字母的频率在一本小说中是如何分布的？某类型文章中常出现的单词是什么？某作家最常用的词汇是什么？《哈利波特与魔法石》中最常用的短语是什么，等等。

（1）要求1：

输出某个英文文本文件中 26 字母出现的频率，由高到低排列，并显示字母出现的百分比，精确到小数点后面两位。

（注：1，字母频率 = 这个字母出现的次数 / （所有A-Z，a-z字母出现的总数）

2，如果两个字母出现的频率一样，那么就按照字典序排列。）

首先是代码：

 1 package filesearch;
 2 import java.io.*;
 3 import java.text.DecimalFormat;
 4 import java.util.*;
 5 public class Filesearch {
 6     static String str ="";
 7     static String str1="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
 8     static char ch1[]= str1.toCharArray();
 9     public static double num[]=new double[100];
10     public static int sum=0;
11     public static void readFile(){
12         File file = new File("f:\\qq\\Harry Potter and the Sorcerer's Stone.txt");
13         try {
14             FileReader r = new FileReader(file);
15             BufferedReader br = new BufferedReader(r);
16             int i=0;
17             str=br.readLine();
18             //将每个字母的出现次数保存到数组中
19             while(str!=null) {
20                 for(int j=0;j<str.length();j++) {
21                     for(int k=0;k<str1.length();k++) {
22                         if(str.charAt(j)==str1.charAt(k)) {
23                             sum++;
24                             num[k]++;
25                         }
26                     }
27                 }
28                 str=br.readLine();
29             }
30             //将文件流关闭
31             br.close();
32             //将出现次数进行排序(选择排序法)
33             for(int p=0;p<str1.length()-1;p++) {
34                 int o=p;
35                 for(int q=p;q<str1.length();q++) {
36                     if(num[o]<num[q]) {
37                         o=q;
38                     }
39                 }
40                 if(o!=p) {
41                     char ff=ch1[o];
42                     ch1[o]=ch1[p];
43                     ch1[p]=ff;
44                     double fff=num[o];
45                     num[o]=num[p];
46                     num[p]=fff;
47                 }
48             }
49             //输出排序后的结果
50             for(int k=0;k<str1.length();k++) {
51                 num[k]=num[k]/sum*100;
52                 System.out.print(ch1[k]);
53                 System.out.print(" ");
54                 System.out.printf("%.2f",num[k]);
55                 System.out.println("%");
56             }
57         }
58         catch(FileNotFoundException e) {
59             e.printStackTrace();
60         }
61         catch(IOException e) {
62             e.printStackTrace();
63         }
64     }
65     public static void main(String[] args) {
66         readFile();
67     }
68 }

思路：

1.创建一个数组ch1,存储所有的字母（a-z,A-Z）,创建一个同样大小的int数组num.

2.遍历文件，按行读取文件并依次对字母数组进行比对，找到时对应的num++（用ch1的下标，即相互对应起来）.

3.对num中和ch1的数据同步排序。

4.输出对应结果。

要求2：

输出单个文件中的前 N 个最常出现的英语单词。

（注：以英文字母开头，由英文字母和字母数字符号组成的字符串视为一个单词。单词以分隔符分割且不区分大小写。在输出时，所有单词都用小写字符表示。）

思路分析：

1）循环：按行读取文件，并将本行用toLowerCase()把大写改成小写，并按空格分割存进数组中。

2）找出每个出现过的单词。

3）对所有单词进行遍历，求出每个不重复单词的个数存入int数组。

4）对int数组和单词数组同步排序。

5）输出前N个单词及其个数。

代码如下：

  1 package filesearch;
  2 import java.io.*;
  3 import java.util.*;
  4 public class FileSearchWords {
  5     private static String str = "";
  6     private static Scanner sc = new Scanner(System.in);
  7     private static BufferedReader cin = null;
  8     private static String a[]=new String[100000];
  9     private static String c[]= new String[1000000];
 10     private static int b[]=new int[1000000];
 11     private static int length=0,length1=0,nn=0,j=0;
 12     //读取文本数据
 13     public static void ReadFile() {
 14         File file = new File("F:\\qq\\Harry Potter and the Sorcerer's Stone.txt");
 15         try {
 16             InputStreamReader read = new InputStreamReader(new FileInputStream(file),"UTF-8");
 17             cin = new BufferedReader(read);
 18             str = cin.readLine();
 19             store();
 20             cin.close();
 21             read.close();
 22         }
 23         catch(IOException e) {
 24             System.out.println("读取失败！");
 25             e.printStackTrace();
 26         }
 27     }
 28     //将所有单词存到数组a
 29     public static void store() throws IOException{
 30         while(str!=null) {
 31             int i=0;
 32             str=str.toLowerCase();
 33             for(i=0;i<str.length();i++) {
 34                 if((str.charAt(i)>96&&str.charAt(i)<123)) {
 35                     a[j]=a[j]+str.charAt(i);
 36                 }
 37                 if(str.charAt(i)==' '||str.charAt(i)==','||str.charAt(i)=='.') {
 38                     if(!a[j].equals("")) {
 39                         j=j+1;
 40                         a[j]="";
 41                     }
 42                 }
 43             }
 44             str=cin.readLine();
 45         }
 46         length=j;
 47     }
 48     //统计每个单词出现的次数
 49     public static void Statistics() {
 50         for(int k=0;k<length;k++) {
 51             b[k]=0;
 52         }
 53         c[0]=a[0];
 54         int tt=1;
 55         Boolean rt = true;
 56         for(int i=1;i<length;i++) {
 57             rt=false;
 58             //找出每个单词的第一次
 59             for(int j=0;j<tt;j++) {
 60                 if(a[i].equals(c[j])) {
 61                     rt=true;
 62                     break;
 63                 }
 64             }
 65             if(!rt) {
 66                 c[tt]=a[i];
 67                 tt++;
 68             }
 69         }
 70         length1=tt;
 71         //将每个单词的出现次数统一一下
 72         for(int i=0;i<length1;i++) {
 73             for(int j=0;j<length;j++) {
 74                 if(c[i].equals(a[j])) {
 75                     b[i]++;
 76                 }
 77             }
 78         }
 79     }
 80     //用选择排序法将次数和对应单词一一排序
 81     public static void Sort() {
 82          int t3=0,t2=0;
 83          String sr="";
 84          for(int i=0;i<length1-1;i++) {
 85              t3=i;
 86              for(int j=i+1;j<length1;j++) {
 87                  if(b[t3]<b[j]) {
 88                      t3=j;
 89                  }
 90              }
 91              if(t3!=i) {
 92                  t2=b[i];
 93                  b[i]=b[t3];
 94                  b[t3]=t2;
 95                  sr=c[i];
 96                  c[i]=c[t3];
 97                  c[t3]=sr;
 98              }
 99          }
100     }
101     //输出结果
102     public static void show() {
103         for(int k=0;k<nn;k++) {
104             System.out.print(c[k]+"\t"+b[k]+"    ");
105             System.out.printf("%.2f",(double)b[k]/length1*100);
106             System.out.print("%");
107             System.out.println("");
108         }
109     }
110     public static void main(String [] args) throws IOException{
111         System.out.println("请输入需要统计的个数:");
112         nn=sc.nextInt();
113         a[0]="";
114         ReadFile();
115         Statistics();
116         Sort();
117         show();
118     }
119 }

要求3（功能1）：

输出文件中所有不重复的单词，按照出现次数由多到少排列，出现次数同样多的，以字典序排列。

思路分析：

1）循环：按行读取文件，并将本行用toLowerCase()把大写改成小写，并按空格分割存进数组中。

2）找出所有单词的第一次。

3）对所有单词进行遍历，求出每个不重复单词的个数存入int数组。

4）对int数组和单词数组同步排序。

5）输出所有个单词及其个数。

（实际上，要求3和要求2基本是相同的，只需在要求2的代码基础上稍做更改即可）

代码如下：

  1 package filesearch;
  2 import java.io.*;
  3 import java.util.*;
  4 public class FileSearchWords {
  5     private static String str = "";
  6     private static Scanner sc = new Scanner(System.in);
  7     private static BufferedReader cin = null;
  8     private static String a[]=new String[100000];
  9     private static String c[]= new String[1000000];
 10     private static int b[]=new int[1000000];
 11     private static int length=0,length1=0,nn=0,j=0;
 12     //读取文本数据
 13     public static void ReadFile() {
 14         File file = new File("F:\\qq\\Harry Potter and the Sorcerer's Stone.txt");
 15         try {
 16             InputStreamReader read = new InputStreamReader(new FileInputStream(file),"UTF-8");
 17             cin = new BufferedReader(read);
 18             str = cin.readLine();
 19             store();
 20             cin.close();
 21             read.close();
 22         }
 23         catch(IOException e) {
 24             System.out.println("读取失败！");
 25             e.printStackTrace();
 26         }
 27     }
 28     //将所有单词存到数组a
 29     public static void store() throws IOException{
 30         while(str!=null) {
 31             int i=0;
 32             str=str.toLowerCase();
 33             for(i=0;i<str.length();i++) {
 34                 if((str.charAt(i)>96&&str.charAt(i)<123)) {
 35                     a[j]=a[j]+str.charAt(i);
 36                 }
 37                 if(str.charAt(i)==' '||str.charAt(i)==','||str.charAt(i)=='.') {
 38                     if(!a[j].equals("")) {
 39                         j=j+1;
 40                         a[j]="";
 41                     }
 42                 }
 43             }
 44             str=cin.readLine();
 45         }
 46         length=j;
 47     }
 48     //统计每个单词出现的次数
 49     public static void Statistics() {
 50         for(int k=0;k<length;k++) {
 51             b[k]=0;
 52         }
 53         c[0]=a[0];
 54         int tt=1;
 55         Boolean rt = true;
 56         for(int i=1;i<length;i++) {
 57             rt=false;
 58             //找出每个单词的第一次
 59             for(int j=0;j<tt;j++) {
 60                 if(a[i].equals(c[j])) {
 61                     rt=true;
 62                     break;
 63                 }
 64             }
 65             if(!rt) {
 66                 c[tt]=a[i];
 67                 tt++;
 68             }
 69         }
 70         length1=tt;
 71         //将每个单词的出现次数统一一下
 72         for(int i=0;i<length1;i++) {
 73             for(int j=0;j<length;j++) {
 74                 if(c[i].equals(a[j])) {
 75                     b[i]++;
 76                 }
 77             }
 78         }
 79     }
 80     //用选择排序法将次数和对应单词一一排序
 81     public static void Sort() {
 82          int t3=0,t2=0;
 83          String sr="";
 84          for(int i=0;i<length1-1;i++) {
 85              t3=i;
 86              for(int j=i+1;j<length1;j++) {
 87                  if(b[t3]<b[j]) {
 88                      t3=j;
 89                  }
 90              }
 91              if(t3!=i) {
 92                  t2=b[i];
 93                  b[i]=b[t3];
 94                  b[t3]=t2;
 95                  sr=c[i];
 96                  c[i]=c[t3];
 97                  c[t3]=sr;
 98              }
 99          }
100     }
101     //输出结果
102     /*public static void show() {
103         for(int k=0;k<nn;k++) {
104             System.out.print(c[k]+"\t"+b[k]+"    ");
105             System.out.printf("%.2f",(double)b[k]/length1*100);
106             System.out.print("%");
107             System.out.println("");
108         }
109     }*/
110     public static void show1() {
111             for(int k=0;k<length1;k++) {
112                     System.out.print(c[k]+"\t \t\t"+b[k]+"\n");
113             }
114                 }
115     //将所有结果输出到文件中
116       public static void Writefile() throws IOException {
117             File file=new File("t1.txt");
118             if(!file.exists())
119                 file.createNewFile();
120             FileWriter write = new FileWriter(file,true);
121             BufferedWriter out=new BufferedWriter(write);
122             for(int i=0;i<length1;i++){
123                 StringBuffer sb=new StringBuffer();
124                 out.write("这是第"+(i+1)+"个: ");
125                 
126                 double f4=(double)b[i]/length1*100;
127                 out.write(c[i]+"\t"+b[i]+"\t"+f4);
128                 out.write("\r\n");
129                
130             }        
131             out.close();
132         }
133     public static void main(String [] args) throws IOException{
134         //System.out.println("请输入需要统计的个数:");
135         //nn=sc.nextInt();
136         a[0]="";
137         ReadFile();
138         Statistics();
139         Sort();
140         //show1();
141         Writefile();
142     }
143 }

刚开始输出发现控制台无法读取到如此多的数据，在上网查找了原因后发现是控制台的缓冲区不够，于是将结果写入到了文件里。

要求4（功能二）：

指定文件目录，递归遍历每个子目录，并执行功能1。

思路分析：

1）找出所给目录中的所有文件存入数组。

2）找出所有文件名的第一次。

3）对所有文件名进行遍历，求出每个不重复单词的个数存入int数组。

4）对int数组和文件名数组同步排序。

5）输出所有文件名及其个数。

代码如下：

查看代码

可以看到，这里用了对文件夹处理的相关方法和操作。