文件方式实现完整的英文词频统计实例

1读入待分析的字符串

2.分解提取单词 

3.计数字典

4.排除语法型词汇

5.排序

6.输出TOP(20)

lyric=open('lyric.txt','w')
lyric.write('''your butt is mine
 
I Gonna tell you right
 
Just show your face
 
In broad daylight
 
I'm telling you
 
On how I feel
 
Gonna Hurt Your Mind
 
Don't shoot to kill
 
Shamone
 
Shamone
 
Lay it on me
 
All right
 
I'm giving you
 
On count of three
 
To show your stuff
 
Or let it be
 
I'm telling you
 
Just watch your mouth
I know your game
 
What you're about
 
Well they say the sky's the limit
 
And to me that's really true
 
But my friend you have seen nothin'
 
Just wait till I get through
 
Because I'm bad,I'm bad
 
shamone
 
(Bad,bad,really,really bad)
 
You know I'm bad,I'm bad
 
(Bad,bad,really,really bad)
 
You know it
 
You know I'm bad,I'm bad
 
Come on,you know
 
(Bad,bad,really,really bad)
 
And the whole world
 
Has to answer right now
 
Just to tell you once again
 
Who's bad
 
The word is out
 
You're doin' wrong
 
Gonna lock you up
 
Before too long
 
Your lyin' eyes
 
Gonna tell you right
 
So listen up
 
Don't make a fight
 
Your talk is cheap
 
You're not a man
 
Your throwin' stones
 
To hide your hands
 
Well they say the sky's the limit
 
And to me that's really true
 
But my friend you have seen nothin'
 
Just wait till I get through
 
Because I'm bad,I'm bad
 
shamone
 
(Bad,bad,really,really bad)
 
You know I'm bad,I'm bad
 
(Bad,bad,really,really bad)
 
You know it
 
You know I'm bad,I'm bad
 
Come on,you know
 
(Bad,bad,really,really bad)
 
And the whole world
 
Has to answer right now
 
Just to tell you once again
 
Who's bad
 
We could change the world tomorrow
 
This could be a better place
 
If you don't like what I'm sayin'
 
Then won't you slap my face
 
Because I'm bad''')
lyric.close()
comment=open('lyric.txt','r')
bad=comment.read()
comment.close()

bad=bad.lower()
for i in ",.?!()":
    bad=bad.replace(i,' ')
bad=bad.replace('
',' ')
words=bad.split(' ')
s=set(words)

delete={"the","a","it","to","on","and"}
for i in delete:
    s.remove(i)
    
dic={}
lis=[]
for i in s:
    if(i==" "):
        continue
    if(i==""):
        continue 
    dic[i]=words.count(i)
    lis.append(words.count(i))

lis=list (dic.items())
lis.sort(key=lambda x:x[1],reverse=True)
for i in range(20):
    print(lis[i])

运行:

原文地址:https://www.cnblogs.com/mavenlon/p/7595133.html