python 数据分析--词云图，图形可视化美国竞选辩论

这篇博客从用python实现分析数据的一个完整过程。以下着重几个python的moudle的运用“pandas”,"“wordcloud”,“matlibplot”；

1、导入数据，看看数据的结构内容：

import pandas as pd
mytext = pd.read_csv(r'F:kaggle data2016-us-presidential-debates	est.csv',encoding = 'iso-8859-1')

>>> mytext.head(2) ######看看数据的结构
   Line   Speaker                                               Text  
0     1      Holt  Good evening from Hofstra University in Hempst...   
1     2  Audience                                         (APPLAUSE)   

        Date  
0  2016/9/26  
1  2016/9/26

2、清洗数据（包括剔除异常数据，新增必要字段，以及简单的字段为空的处理）

text = mytext.iloc[7:26,:].reset_index(drop=True)  #行操作：前面几行是寒暄不用具体看所以删除前面7行
del text['Date'] #列操作：删除Date列
## text.insert(3,"新加列",新加列的数值）

3、制作各个candidate的言论的词云图;

import matplotlib.pyplot as plt 
from wordcloud import WordCloud   ##词云库
import nltk
from nltk.corpus import stopwords  ##分词库
stopwords =set(stopwords("english"))
stopwords |={"will","yes"}
words = " ".join((text.Speaker=='Clinton').['Text']) ##将希拉里的发言汇集起来
cloud =WordCloud(background_color="white",width=3000,height=2500，stopwords=stopwords).generate(words)
plt.figure(1,figsize=(8,8))
plt.imshow(cloud)
plt.show()

ps:这里只是做了个简单的可视化，可以自己尝试

4、进一步挖掘数据内容【ex:此前美国媒体反映在美国竞选第一次辩论时，川普经常被打断的现象严重，我们可以用数据来观察这一现象】

trump=[3,5,7] #'Making laugh','Making applaud','Be interrupted'
clinton=[3,3,2] #'Making laugh','Making applaud','Be interrupted'
fig,ax=plt.subplots()
width=0.35
rects1 = ax.bar(ind,trump,width,color='r') 
tects2 = ax.bar(ind+width,clinton,width,color='y')
ax.set_ylabel('Counts')
ax.set_title('Counts of behavior of mediator and audience')
ax.set_xticks(ind)
ax.set_xticklabels(('Making laugh','Making applaud','Be interrupted'),rotation =45)
plt.show()

根据这个观点然后去挖掘可以看到川普频繁被打断，其实后面还可以挖挖，川普在说什么内容的时候被打断（同样可以类似上面的词云的操作，有兴趣可以试下）。