python2/3中的字符/编码

1，python3中字符序列的类型：bytes和str。bytes包含原始的8位值；str的变量包含Unicode字符。

2，python2中字符序列的类型：str和unicode。str是包含原始的8位值的byte数组，
或者说是纯ascii码字符组成的字符串，与Python3中的bytes类型对应；
unicode的变量包含Unicode字符，与python3中的str对应。

3，编码与解码
encode()方法把字符串变成用于存储或传输的字节序列就是编码
decode()方法把字节序列变成人们可读的文本字符串就是解码

4，python2中字符序列str与unicode转换：
str——decode解码-->unicode——encode编码-->str
5，python3中的转换：
byte——decode解码-->str——>encode编码-->byte

#在python2中
>>> type('a')
<type 'str'>

>>> type('a'.decode('utf-8'))
<type 'unicode'>

>>> type(u'a'.encode('utf-8'))
<type 'str'>

>>> type(u'中文')
<class 'unicode'>

#在python3中
>>> type(a)
<class 'str'>

>>> type(b'a')
<class 'bytes'>

>>> type(b'a'.decode('utf-8'))
<class 'str'>

>>> type('a'.encode('utf-8'))
<class 'bytes'>

>>> type(u'中文')
<class 'str'>

●python2默认字符编码：ascii
●python3默认文件编码：utf-8（解释器编码）

python3的内存里：全部是unicode
python3执行代码的过程：
1、解释器找到代码文件，把代码字符串按文件头定义的编码加载到内存，转成unicode
2、把代码字符串按照python语法规则进行解释
3、所有的变量字符都会以unicode编码声明
※windows的默认编码是gbk。

●python2
　　文件编码默认：ascii
　　字符串编码默认：ascii
　　如果文件头声明了utf-8，那字符串的编码是utf-8

●python3
　　文件编码默认：utf-8
　　字符串编码：unicode

●文件头：
python2：以utf-8 or gbk编码的代码，代码内容加载到内存，并不会被转成unicode，编码依然是utf-8 or gbk。
python3：以utf-8 or gbk编码的代码，代码内容加到在内存，会被自动转成unicode。

●常见编码错误的原因有：
　　1、python解释器的默认编码
　　2、python源文件文件编码
　　3、终端使用的编码（windows/linux/os）
　　4、操作系统的语言设置
※unicode与gbk的映射表：http://www.unicode.org/charts/

------山的那一边