单词 统计

用户需求:

英语的26 个字母的频率在一本小说中是如何分布的?

某类型文章中常出现的单词是什么?

某作家最常用的词汇是什么?

《哈利波特》 中最常用的短语是什么,等等。

我们就写一些程序来解决这个问题,满足一下我们的好奇心。

第0步:输出某个英文文本文件中 26 字母出现的频率,由高到低排列,并显示字母出现的百分比,精确到小数点后面两位。

字母频率 = 这个字母出现的次数 / (所有A-Z,a-z字母出现的总数)

如果两个字母出现的频率一样,那么就按照字典序排列。  如果 S 和 T 出现频率都是 10.21%, 那么, S 要排在T 的前面。

第1步:输出单个文件中的前 N 个最常出现的英语单词。

作用:一个用于统计文本文件中的英语单词出现频率。

单词:以英文字母开头,由英文字母和字母数字符号组成的字符串视为一个单词。单词以分隔符分割且不区分大小写。在输出时,所有单词都用小写字符表示。

英文字母:A-Z,a-z

字母数字符号:A-Z,a-z,0-9

第1步:输出单个文件中的前 N 个最常出现的英语单词。

分割符:空格,非字母数字符号 例:good123是一个单词,123good不是一个单词。good,Good和GOOD是同一个单词。

package text428;
import java.text.DecimalFormat;
import java.util.*;
import java.util.regex.*;
public class tongji {
	static DecimalFormat df = new DecimalFormat("0.0000%");
	public static void main(String[] args) {
		String word = "SCARLETT O’HARA was not beautiful, but men seldom realized it when caught by her charmas the Tarleton twins were. In her face were too sharply blended the delicate features of her mother,a Coast aristocrat of French descent, and the heavy ones of her florid Irish father. But it was anarresting face, pointed of chin, square of jaw. Her eyes were pale green without a touch of hazel,starred with bristly black lashes and slightly tilted at the ends. Above them, her thick black browsslanted upward, cutting a startling oblique line in her magnolia-white skin—that skin so prized bySouthern women and so carefully guarded with bonnets, veils and mittens against hot Georgiasuns.
" + 
				"  Seated with Stuart and Brent Tarleton in the cool shade of the porch of Tara, her father’splantation, that bright April afternoon of 1861, she made a pretty picture. Her new green flowered-muslin dress spread its twelve yards of billowing material over her hoops and exactly matched theflat-heeled green morocco slippers her father had recently brought her from Atlanta. The dress set off to perfection the seventeen-inch waist, the smallest in three counties, and the tightly fittingbasque showed breasts well matured for her sixteen years. But for all the modesty of her spreadingskirts, the demureness of hair netted smoothly into a chignon and the quietness of small whitehands folded in her lap, her true self was poorly concealed. The green eyes in the carefully sweetface were turbulent, willful, lusty with life, distinctly at variance with her decorous demeanor. Hermanners had been imposed upon her by her mother’s gentle admonitions and the sterner disciplineof her mammy; her eyes were her own.
" + 
				"  On either side of her, the twins lounged easily in their chairs, squinting at the sunlight throughtall mint-garnished glasses as they laughed and talked, their long legs, booted to the knee and thickwith saddle muscles, crossed negligently. Nineteen years old, six feet two inches tall, long of boneand hard of muscle, with sunburned faces and deep auburn hair, their eyes merry and arrogant,their bodies clothed in identical blue coats and mustard-colored breeches, they were as much alikeas two bolls of cotton.
" + 
				"  Outside,Caught the late afternoon sun slanted down in the yard, throwing into gleaming brightness thedogwood trees that were solid masses of white blossoms against the background of new green. Thetwins’ horses were hitched in the driveway, big animals, red as their masters’ hair; and around thehorses’ legs quarreled the pack of lean, nervous possum hounds that accompanied Stuart and Brentwherever they went. A little aloof, as became an aristocrat, lay a black-spotted carriage dog,muzzle on paws, patiently waiting for the boys to go home to supper.
" + 
				"  ";
		String words=word.toLowerCase();
		String reg = "[a-zA-Z]+";
		Pattern p = Pattern.compile(reg);
		Matcher m = p.matcher(words);
		HashMap<String, Integer> map = new HashMap<String, Integer>();
		Integer count = 0;
		while (m.find()) {
			count++;
			String w = m.group();
			if (null == map.get(w)) {
				map.put(w, 1);
			} else {
				int x = map.get(w);
				map.put(w, x + 1);
			}
		}
		System.out.println(count);
		for (Map.Entry<String, Integer> entry:map.entrySet()) {
			System.out.println(entry.getKey()+ ";"+ entry.getValue());
			System.out.println( df.format(entry.getValue()*1.0/count));
			
		}
		
		//System.out.println(map);
		
	}

}

  实验截图:

原文地址:https://www.cnblogs.com/KYin/p/11071644.html