【python】文章、文本内容做词频统计(使用jieba分词,添加自定义字典)

 使用python可以轻松统计词频,做文章的词频统计也是轻而易举的事情。

1、添加自定义字典(如:超级赛亚人、奥里给等)

2、jieba分词

PS:直接将文章丢进 tf.txt 文件里,将自定义字典丢进 dict.txt 文件里就OK了

import jieba  
txt = open("tf.txt", encoding="utf-8").read()
jieba.load_userdict("dict.txt")
words  = jieba.lcut(txt)  
counts = {}
for word in words:  
    counts[word] = counts.get(word,0) + 1
items = list(counts.items())  
items.sort(key=lambda x:x[1], reverse=True)
for i in range(100):  
    word, count = items[i]
    #print (word)
    #print(count)
    print ("{0:<10}{1:>5}".format(word, count))
print('
')
for i in range(100):  
    word, count = items[i]
    #print(count/35323)
    #print ("{0:<10}{1:>5}".format(word, count / 35323))

示例图: 

原文地址:https://www.cnblogs.com/helenlee01/p/12617489.html