python开源项目Scrapy抓取文件乱码解决

scrapy进行页面抓去的时候，保存的文件出现乱码，经过分析是编码的原因，只需要把编码转换为utf-8即可，代码片段

......

import chardet

......

content_type = chardet.detect(html_content)

#print(content_type['encoding'])

if content_type['encoding'] != "UTF-8":

html_content = html_content.decode(content_type['encoding'])

html_content = html_content.encode("utf-8")

open(filename,"wb").write(html_content)

....

这样保存的文件就是中文了。

步骤:

先把gb2312的编码转换为unicode编码

然后在把unicode编码转换为utf-8.