python 基于 wordcloud + jieba + matplotlib 生成词云

词云##

词云是啥？词云突出一个数据可视化，酷炫。以前以为很复杂，不想python已经有成熟的工具来做词云。而我们要做的就是准备关键词数据，挑一款字体，挑一张模板图片，非常非常无脑。准备好了吗，快跟我一起动手吧

模块##

本案例基于python3.6，相关模块如下，安装都是直接 pip install <模块名>：

wordcloud 作用如其名。本例核心模块，它把我们带权重的关键词渲染成词云
matplotlib 绘图模块，主要作用是把wordcloud生成的图片绘制出来并在窗口展示
numpy 图像处理模块，读取图片生成像素矩阵
PIL (pip install pillow) 图片处理模块，打开初始化图片
jieba 牛逼的分词模块，因为我是从一个txt文本里提取关键词，所以需要 jieba 来分词并统计词频。如果是已经有了现成的数据，不再需要它

代码##

# -*- coding=utf8 -*-
import matplotlib.pyplot as plt
import jieba.analyse
import numpy
from PIL import Image
from wordcloud import WordCloud, ImageColorGenerator

def readTxt(file, encoding='utf8'):
    """
    :param file:
    :param encoding:
    :return:
    """
    with open(txt_file, 'r', encoding='utf16') as f:
        txt = f.read()
    return txt

def textDict(content):
    """
    jieba 提取1000个关键词及其比重
    :param content:
    :return:
    """
    result = jieba.analyse.textrank(content, topK=1000, withWeight=True)
    # 转化为比重字典
    keywords = dict()
    for i in result:
        keywords[i[0]] = i[1]
    return keywords

def renderWordCloud(keywords, sourceImg):
    # 获取图片资源
    image = Image.open(sourceImg)
    # 转为像素矩阵
    graph = numpy.array(image)

    # wordcloud 默认字体库不支持中文，这里自己选取中文字体
    fontPath = 'C:/Windows/Fonts/SIMLI.TTF'
    #fontPath = 'C:/Windows/Fonts/mplus-1mn-regular.ttf'
    wc = WordCloud(
        font_path=fontPath,
        background_color='white',
        max_words=1000,
        # 使用的词云模板背景
        mask=graph
    )
    # 基于关键词信息生成词云
    wc.generate_from_frequencies(keywords)
    # 读取模板图片的颜色
    image_color = ImageColorGenerator(graph)
    # 生成词云图
    plt.imshow(wc)
    # 用模板图片的颜色覆盖
    plt.imshow(wc.recolor(color_func=image_color))
    # 关闭图像坐标系
    plt.axis('off')
    # 显示图片--在窗口显示
    plt.show()


txt_file = 'C:/Users/KF/Downloads/《围城》钱钟书（完美版）.TXT'
source_img = 'C:/Users/KF/Pictures/ul1241-2001.jpg'
#source_img = 'C:/Users/KF/Pictures/微信图片_20170710102042.jpg'
#source_img = 'C:/Users/KF/Pictures/微信图片_20170710102054.jpg'
#source_img = 'E:DOCCarlwallpapersd250038c4fde4ea7f36ebe010a7b58ca.jpg'

content = readTxt(txt_file)
keywords = textDict(content)
renderWordCloud(keywords, source_img)

python 基于 wordcloud + jieba + matplotlib 生成词云

词云##

模块##

代码##

成果##