单词 统计续

第1步:输出单个文件中的前 N 个最常出现的英语单词。

功能1:输出文件中所有不重复的单词,按照出现次数由多到少排列,出现次数同样多的,以字典序排列。

功能2: 指定文件目录,对目录下每一个文件执行统计的操作。 

功能3:指定文件目录,是会递归遍历目录下的所有子目录的文件进行统计单词的功能。

功能4:输出出现次数最多的前 n 个单词, 

 例如, 提示统计统计前多少名:输入10。 就是输出最常出现单词的前 10 名。 当没有指明数量的时候,我们默认列出所有单词的频率。

第2步:第二步:  支持 stop words

在一本小说里, 频率出现最高的单

词一般都是 "a",  "it", "the", "and", "this", 这些词, 可以做一个 stop word 文件 (停词表), 在统计词汇的时候,跳过这些词。  我们把这个文件叫 "stopwords.txt" file. 

第三步:  想看看常用的短语是什么, 怎么办呢? 

先定义短语:"两个或多个英语单词, 它们之间只有空格分隔".   请看下面的例子:

 

  hello world   //这是一个短语

 

  hello, world //这不是一个短语

 

同一频率的词组, 按照字典序来排列。

第四步:把动词形态都统一之后再计数。

想找到常用的单词和短语,但是发现英语动词经常有时态和语态的变化,导致同一个词,同一个短语却被认为是不同的。 怎么解决这个问题呢?

假设我们有这样一个文本文件,这个文件的每一行都是这样构成:

动词原型  动词变形1 动词变形2... ,词之间用空格分开。

e.g.  动词 TAKE 有下面的各种变形:take takes took taken taking

我们希望在实现上面的各种功能的时候,有一个选项, 就是把动词的各种变形都归为它的原型来统计。 

功能 支持动词形态的归一化

package Text;

import java.io.*;
import java.text.DecimalFormat;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Text {
	static String words = "SCARLETT O’HARA was not beautiful, but men seldom realized it when caught by her charmas the Tarleton twins were. In her face were too sharply blended the delicate features of her mother,a Coast aristocrat of French descent, and the heavy ones of her florid Irish father. But it was anarresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel,starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black browsslanted upward, cutting a startling oblique line in her magnolia-white skin—that skin so prized bySouthern women and so carefully guarded with bonnets, veils and mittens against hot Georgiasuns.
"
			+ "  Seated with Stuart and Brent Tarleton in the cool shade of the porch of Tara, her father’splantation, that bright April afternoon of 1861, she made a pretty picture. Her new green flowered-muslin dress spread its twelve yards of billowing material over her hoops and exactly matched theflat-heeled green morocco slippers her father had recently brought her from Atlanta. The dress set off to perfection the seventeen-inch waist, the smallest in three counties, and the tightly fittingbasque showed breasts well matured for her sixteen years. But for all the modesty of her spreadingskirts, the demureness of hair netted smoothly into a chignon and the quietness of small whitehands folded in her lap, her true self was poorly concealed. The green eyes in the carefully sweetface were turbulent, willful, lusty with life, distinctly at variance with her decorous demeanor. Hermanners had been imposed upon her by her mother’s gentle admonitions and the sterner disciplineof her mammy; her eyes were her own.
"
			+ "  On either side of her, the twins lounged easily in their chairs, squinting at the sunlight throughtall mint-garnished glasses as they laughed and talked, their long legs, booted to the knee and thickwith saddle muscles, crossed negligently. Nineteen years old, six feet two inches tall, long of boneand hard of muscle, with sunburned faces and deep auburn hair, their eyes merry and arrogant,their bodies clothed in identical blue coats and mustard-colored breeches, they were as much alikeas two bolls of cotton.
"
			+ "  Outside, the late afternoon sun slanted down in the yard, throwing into gleaming brightness thedogwood trees that were solid masses of white blossoms against the background of new green. Thetwins’ horses were hitched in the driveway, big animals, red as their masters’ hair; and around thehorses’ legs quarreled the pack of lean, nervous possum hounds that accompanied Stuart and Brentwherever they went. A little aloof, as became an aristocrat, lay a black-spotted carriage dog,muzzle on paws, patiently waiting for the boys to go home to supper.
"
			+ "  ";
	static String letter = "abcdefghijklmnopqrstuvwxyz";

	public static void b() {
		DecimalFormat df = new DecimalFormat("0.0000%");
		String words = "SCARLETT O’HARA was not beautiful, but men seldom realized it when caught by her charmas the Tarleton twins were. In her face were too sharply blended the delicate features of her mother,a Coast aristocrat of French descent, and the heavy ones of her florid Irish father. But it was anarresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel,starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black browsslanted upward, cutting a startling oblique line in her magnolia-white skin—that skin so prized bySouthern women and so carefully guarded with bonnets, veils and mittens against hot Georgiasuns.
"
				+ "  Seated with Stuart and Brent Tarleton in the cool shade of the porch of Tara, her father’splantation, that bright April afternoon of 1861, she made a pretty picture. Her new green flowered-muslin dress spread its twelve yards of billowing material over her hoops and exactly matched theflat-heeled green morocco slippers her father had recently brought her from Atlanta. The dress set off to perfection the seventeen-inch waist, the smallest in three counties, and the tightly fittingbasque showed breasts well matured for her sixteen years. But for all the modesty of her spreadingskirts, the demureness of hair netted smoothly into a chignon and the quietness of small whitehands folded in her lap, her true self was poorly concealed. The green eyes in the carefully sweetface were turbulent, willful, lusty with life, distinctly at variance with her decorous demeanor. Hermanners had been imposed upon her by her mother’s gentle admonitions and the sterner disciplineof her mammy; her eyes were her own.
"
				+ "  On either side of her, the twins lounged easily in their chairs, squinting at the sunlight throughtall mint-garnished glasses as they laughed and talked, their long legs, booted to the knee and thickwith saddle muscles, crossed negligently. Nineteen years old, six feet two inches tall, long of boneand hard of muscle, with sunburned faces and deep auburn hair, their eyes merry and arrogant,their bodies clothed in identical blue coats and mustard-colored breeches, they were as much alikeas two bolls of cotton.
"
				+ "  Outside, the late afternoon sun slanted down in the yard, throwing into gleaming brightness thedogwood trees that were solid masses of white blossoms against the background of new green. Thetwins’ horses were hitched in the driveway, big animals, red as their masters’ hair; and around thehorses’ legs quarreled the pack of lean, nervous possum hounds that accompanied Stuart and Brentwherever they went. A little aloof, as became an aristocrat, lay a black-spotted carriage dog,muzzle on paws, patiently waiting for the boys to go home to supper.
"
				+ "  ";
		String reg = "[a-zA-Z]+";
		Pattern p = Pattern.compile(reg);
		Matcher m = p.matcher(words);
		HashMap<String, Integer> map = new HashMap<String, Integer>();
		// HashMap<String, String> map1 = new HashMap<String, String>();
		Integer count = 0;
		while (m.find()) {
			count++;
			String w = m.group();
			if (null == map.get(w)) {
				map.put(w, 1);
			} else {
				int x = map.get(w);
				map.put(w, x + 1);
			}
		}
		System.out.println(count);
		for (Map.Entry<String, Integer> entry : map.entrySet()) {
			System.out.println(entry.getKey() + ";" + entry.getValue());
			System.out.println(df.format(entry.getValue() * 1.0 / count));
		}
	}

	public static void a(String text, String letter) throws UnsupportedEncodingException {
		DecimalFormat df = new DecimalFormat("0.0000%");
		// 定义需要计算字母出现频率的文本
		// 定义存储各字母出现次数的数组
		int[] counter = new int[26];
		String[] rate = new String[26];
		// 定义字母总个数的计数器。排除标点符号,空格和数字。
		int total_counter = 0;
		// 将string类型转化成char类型
		char[] text_tr = text.toCharArray();
		char[] letter_tr = letter.toCharArray();
		// 计算各字母出现次数以及总字母数
		// 外层循环,遍历26个字母
		for (int i = 0; i < 26; i++) {
			// 内层循环,遍历待计算文本
			for (int j = 0; j < text_tr.length; j++) {
				// 字母每在文本中出现一次,字母计数器加一,总字母个数计数器加一
				if (text_tr[j] == letter_tr[i]) {
					counter[i]++;
					total_counter++;
				}
			}
		}
		String s = "";

		System.out.println("sum:" + total_counter);

		for (int i = 0; i < 26; i++) {
			rate[i] = df.format(counter[i] * 1.0 / total_counter);
			// System.out.println(counter[i]);
		}
		for (int i = 0; i < 26; i++) {

			s = s + letter_tr[i] + "s number is " + String.valueOf(counter[i]) + " ";

			System.out.println(
					letter_tr[i] + "'s number is " + counter[i] + " and" + letter_tr[i] + "'s rate is " + rate[i]);
		}
	}
	public static String c() {
		int N = 0;
		Scanner in = new Scanner(System.in);
		N = in.nextInt();
		in.close();
		StringBuffer sb = new StringBuffer();
		String words = "SCARLETT O’HARA was not beautiful, but men seldom realized it when caught by her charmas the Tarleton twins were. In her face were too sharply blended the delicate features of her mother,a Coast aristocrat of French descent, and the heavy ones of her florid Irish father. But it was anarresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel,starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black browsslanted upward, cutting a startling oblique line in her magnolia-white skin—that skin so prized bySouthern women and so carefully guarded with bonnets, veils and mittens against hot Georgiasuns.
"
				+ "  Seated with Stuart and Brent Tarleton in the cool shade of the porch of Tara, her father’splantation, that bright April afternoon of 1861, she made a pretty picture. Her new green flowered-muslin dress spread its twelve yards of billowing material over her hoops and exactly matched theflat-heeled green morocco slippers her father had recently brought her from Atlanta. The dress set off to perfection the seventeen-inch waist, the smallest in three counties, and the tightly fittingbasque showed breasts well matured for her sixteen years. But for all the modesty of her spreadingskirts, the demureness of hair netted smoothly into a chignon and the quietness of small whitehands folded in her lap, her true self was poorly concealed. The green eyes in the carefully sweetface were turbulent, willful, lusty with life, distinctly at variance with her decorous demeanor. Hermanners had been imposed upon her by her mother’s gentle admonitions and the sterner disciplineof her mammy; her eyes were her own.
"
				+ "  On either side of her, the twins lounged easily in their chairs, squinting at the sunlight throughtall mint-garnished glasses as they laughed and talked, their long legs, booted to the knee and thickwith saddle muscles, crossed negligently. Nineteen years old, six feet two inches tall, long of boneand hard of muscle, with sunburned faces and deep auburn hair, their eyes merry and arrogant,their bodies clothed in identical blue coats and mustard-colored breeches, they were as much alikeas two bolls of cotton.
"
				+ "  Outside, the late afternoon sun slanted down in the yard, throwing into gleaming brightness thedogwood trees that were solid masses of white blossoms against the background of new green. Thetwins’ horses were hitched in the driveway, big animals, red as their masters’ hair; and around thehorses’ legs quarreled the pack of lean, nervous possum hounds that accompanied Stuart and Brentwherever they went. A little aloof, as became an aristocrat, lay a black-spotted carriage dog,muzzle on paws, patiently waiting for the boys to go home to supper.
"
				+ "  ";
		String reg = "[a-zA-Z]+";
		Pattern p = Pattern.compile(reg);
		Matcher m = p.matcher(words);
		HashMap<String, Integer> map = new HashMap<String, Integer>();
		Integer count = 0;
		while (m.find()) {
			count++;
			String w = m.group();
			if (null == map.get(w)) {
				map.put(w, 1);
			} else {
				int x = map.get(w);
				map.put(w, x + 1);
			}
		}
		Iterator<String> iterator = map.keySet().iterator();
		String a[] = new String[N];
		int s[] = new int[N];
		for (int i = 0; i < N; i++) {
			iterator = map.keySet().iterator();
			while (iterator.hasNext()) {
				String word = (String) iterator.next();
				if (s[i] < map.get(word)) {
					s[i] = map.get(word);
					a[i] = word;
				}
			}
			sb.append("单词:").append(a[i]).append(" 次数").append(map.get(a[i])).append("
");
			map.remove(a[i]);
		}
		System.out.println(sb.toString());
		return sb.toString();
	}
	public static void d() {
		DecimalFormat df = new DecimalFormat("0.0000%");
		String words = "SCARLETT O’HARA was not beautiful, but men seldom realized it when caught by her charmas the Tarleton twins were. In her face were too sharply blended the delicate features of her mother,a Coast aristocrat of French descent, and the heavy ones of her florid Irish father. But it was anarresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel,starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black browsslanted upward, cutting a startling oblique line in her magnolia-white skin—that skin so prized bySouthern women and so carefully guarded with bonnets, veils and mittens against hot Georgiasuns.
" + 
				"  Seated with Stuart and Brent Tarleton in the cool shade of the porch of Tara, her father’splantation, that bright April afternoon of 1861, she made a pretty picture. Her new green flowered-muslin dress spread its twelve yards of billowing material over her hoops and exactly matched theflat-heeled green morocco slippers her father had recently brought her from Atlanta. The dress set off to perfection the seventeen-inch waist, the smallest in three counties, and the tightly fittingbasque showed breasts well matured for her sixteen years. But for all the modesty of her spreadingskirts, the demureness of hair netted smoothly into a chignon and the quietness of small whitehands folded in her lap, her true self was poorly concealed. The green eyes in the carefully sweetface were turbulent, willful, lusty with life, distinctly at variance with her decorous demeanor. Hermanners had been imposed upon her by her mother’s gentle admonitions and the sterner disciplineof her mammy; her eyes were her own.
" + 
				"  On either side of her, the twins lounged easily in their chairs, squinting at the sunlight throughtall mint-garnished glasses as they laughed and talked, their long legs, booted to the knee and thickwith saddle muscles, crossed negligently. Nineteen years old, six feet two inches tall, long of boneand hard of muscle, with sunburned faces and deep auburn hair, their eyes merry and arrogant,their bodies clothed in identical blue coats and mustard-colored breeches, they were as much alikeas two bolls of cotton.
" + 
				"  Outside, the late afternoon sun slanted down in the yard, throwing into gleaming brightness thedogwood trees that were solid masses of white blossoms against the background of new green. Thetwins’ horses were hitched in the driveway, big animals, red as their masters’ hair; and around thehorses’ legs quarreled the pack of lean, nervous possum hounds that accompanied Stuart and Brentwherever they went. A little aloof, as became an aristocrat, lay a black-spotted carriage dog,muzzle on paws, patiently waiting for the boys to go home to supper.
" + 
				"  ";
		String reg = "[a-zA-Z]+";
		Pattern p = Pattern.compile(reg);
		Matcher m = p.matcher(words);
		HashMap<String, Integer> map = new HashMap<String, Integer>();
		Integer count = 0;
		while (m.find()) {
			count++;
			String w = m.group();
			if (null == map.get(w)) {
				map.put(w, 1);
			} else {
				int x = map.get(w);
				map.put(w, x + 1);
			}
		}
		for (Map.Entry<String, Integer> entry:map.entrySet()) {
			if(entry.getKey().equals("On")) {
				map.remove("On");
			}
		}
		for (Map.Entry<String, Integer> entry:map.entrySet()) {
			System.out.println(entry.getKey()+ ";"+ entry.getValue());	
		}
	}
	public static void main(String args[]) throws UnsupportedEncodingException {
		int X = 0;
		System.out.println("选择进行的功能:");
		System.out.println("一、统计字母出现的个数及频率");
		System.out.println("二、统计单词出现的个数及频率");
		System.out.println("三、输出前N个出现频率最高的单词");
		System.out.println("四、停用无用此表");
		Scanner input = new Scanner(System.in);
		X = input.nextInt();
		switch (X) {
		case 1: {
			a(words, letter);
			break;
		}
		case 2: {
			b();
			break;
		}
		case 3: {
			c();
			break;
		}
		case 4: {
			d();
			break;
		}
		}
input.close();
	}

}

  实验截图:

原文地址:https://www.cnblogs.com/KYin/p/11071646.html