从语料中找出低频词-去除无用信息

1.做文本聚类时,有些低频词是分词错误或者无用信息,前期需要处理掉

关键代码:

from collections import Couter

def func_counter(word_list):
    count_result = Counter(word_list)
    # print(count_result)   # 一个字典对象
    # print(count_result.keys())   # 一个字典key值
    # print(count_result.values())   # 一个字典value值
    # print(list(count_result.elements()))  # 返回的是 word_list
    # print(count_result.most_common(3))
    return count_result
原文地址:https://www.cnblogs.com/demo-deng/p/11933460.html