综合练习:英文词频统计

综合练习:英文词频统计

下载一首英文的歌词或文章

sing  = '''
i'm just a little bit caught in the middle
life is a maze and love is a riddle
i don't know where to go
can't do it alone
i've tried but i don't know why
slow it down make it stop
or else my heart is going to pop
cause its to much yea its alot
to be something i'm not
i'm a fool out of love
cause i just can't get enough
i'm just a little bit caught in the middle
life is a maze and love is a riddle
i don't know where to go
can't do it alone
i've tride but i don't know why
i'm just a little girl lost in the moment
i'm so scared but i don't show it
i can't figure it out
it's bringing me down
i know i've got to let it go
and just enjoy the show
the sun is hot in the sky
just like a giant spot light
the people follow the signs
and sicronise in time
it's just no body knows
they got to take it to the show
i'm just a little bit caught in the middle
life is a maze and love is a riddle
i don't know where to go
can't do it alone
i've tried but i don't know why
i'm just a little girl lost in the moment
i'm so scared but i don't show it
i can't figure it out
it's bringing me down
i know i've got to let it go
and just enjoy the show
just engoy the show
i'm just a little bit caught in the middle
life is a maze and love is a riddle
i don't know where to go
can't do it alone
i've tride but i don't know why
i'm just a little girl lost in the moment
i'm so scared but i don't show it
i can't figure it out
it's bringing me down
i know i've got to let it go
and just enjoy the show
just enjoy the show
just enjoy the show
i want my money back
i want my money back
i want my money back
just enjoy the show
i want my money back
i want my money back
i want my money back
just enjoy the show 
'''

1.将所有,.?!’:等分隔符全部替换为空格

newSing = sing.replace("'"," ").replace("."," ").replace("?"," ").replace("
"," ")
print(newSing)


2.将所有小写转换为大写

newSmall = newSing.upper()
print(newSmall)


3.生成单词列表

listWord = newSing.replace("\"," ").split(" ")
print(listWord)


4.生成词频统计

DicWord ={}
for word in listWord:
    if word in DicWord.keys():
        DicWord[word] +=1
    else:
        DicWord[word] =1
print(DicWord)


5.排序

Dec = sorted(DicWord.keys())
print(Dec)


6.排除语法型词汇,代词、冠词、连词

vocalbuary = ["a","so","the","they","is","in","to","of","i"]
for word in vocalbuary:
    del DicWord[word]
print(DicWord)


7.输出词频最大TOP10

NewDicWord = sorted(DicWord.items(),key=lambda item:item[1],reverse=True)
print(NewDicWord)

for word in range(10):
    print(NewDicWord[word])

做了许多,很多都是查阅网上的一些资料,其实也是加深自己的一个基础牢固程度吧~

原文地址:https://www.cnblogs.com/qazwsx833/p/8619743.html