Python字符串编码转换-encode()和decode()方法



Python 3.x 默认采用 UTF-8 编码格式，有效地解决了中文乱码的问题。
目前我们公司采用的是Python 2.x。

在 Python 中，有 2 种常用的字符串类型，分别为 str 和 bytes 类型，其中 str 用来表示 Unicode 字符，bytes 用来表示二进制数据。
str 类型和 bytes 类型之间就需要使用 encode() 和 decode() 方法进行转换。

Python encode()方法
encode() 过程称为“编码” ： str 类型 → bytes 类型
encode() 方法的语法格式：
str.encode([encoding="utf-8"][,errors="strict"])
注意，格式中用 [] 括起来的参数为可选参数，也就是说，在使用此方法时，可以使用 [] 中的参数，也可以不使用。

该方法各个参数的含义如表 1 所示。

注意，使用 encode() 方法对原字符串进行编码，不会直接修改原字符串，如果想修改原字符串，需要重新赋值。


Python decode()方法
decode() 过程称为“解码” ： bytes 类型的二进制数  → str 类型

decode() 方法的语法格式如下：
bytes.decode([encoding="utf-8"][,errors="strict"])
该方法中各参数的含义如表 2 所示。

出现报错：UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 0: ordinal not in range(128)
可以在开头添加如下代码：

import sys
reload(sys)
sys.setdefaultencoding('utf-8')