中文词频统计

#coding=utf--8
import jieba
exclude={',','、','。','u3000','
','"',"《",'》','?'}
txt=open('doupo.txt','r').read()
wordList=list(jieba.cut(txt))
wordSet=set(wordList)-exclude
wordDict={}
for w in wordSet:
    wordDict[w]=wordList.count(w)
dictList=list(wordDict.items())
dictList.sort(key=lambda x:x[1],reverse=True)
for i in range(20):
    print (dictList[i])

【推广】免费学中医，健康全家人

原文地址：https://www.cnblogs.com/swxvico/p/8660858.html

推荐文章
AFO
路
暑假闲愁
概率的基础计算
OI程序常见的设计陷阱
若干字符串算法
线性DP
二项式反演的证明
数学小专题
图论小专题C
牛客第三场_C_Operation Love
牛客第二场_B_Boundary
牛客第二场_A_All with Pairs
Finding Palindromes(POJ 3376)
Codeforces —— Check Transcription(1056E)
Codeforces —— String(128B)
Codeforces —— New Year and Conference(1284D)
Codeforces —— Felicity is Coming!(757C)
Codeforces —— Let's Play the Words?(1277D)
Codeforces —— Complete Tripartite(1228D)
做题记录
关于有向图强连通分量的一点想法
浅谈二分图匹配（未完）
水题狂欢赛 (爬楼梯赛)题解（偏向自我反省）
浅谈迭代加深（iddfs）
浅谈单调队列优化
[cqbzoj#10644]鱼肉炸弹题解
树形背包[2/ 50] luogu [P1273]
树形背包[1/ 50] luogu [P2015] （超级板）
（树状数组）区间修改，区间查询