python2和python3编码

python2编码

# str: bytes

>>> s = '你好 world'
>>> print repr(s)
'xe4xbdxa0xe5xa5xbd world'
>>> print len(s)
12
>>> print type(s)
<type 'str'>

# unicode:unicode

>>> s = u'你好 world'
>>> print repr(s)
u'u4f60u597d world'
>>> print len(s) 
8
>>> print type(s)
<type 'unicode'>

#unicode: 无论什么字符在Unicode都有一个对应。

python2的特点
1.在python2中print把字节转成了Unicode

2.python2中以默认已ASCII编码
[root@localhost ~]# cat python.py
#coding:utf8 # 告诉解释器以utf8编码
print '你好'

python3编码
在python3中默认以utf8编码

>>> s = '你好 world'
>>> print (json.dumps(s))
"u4f60u597d world"
>>> print (len(s))
8
>>> print (type(s))
<class 'str'>

编码解码方式1:

>>> s = '你好 world' 
>>> b = s.encode('utf8')
>>> print (b)
b'xe4xbdxa0xe5xa5xbd world'
>>> s = b.decode('utf8')
>>> print (s)
你好 world
>>> s = b.decode('gbk')
>>> print (s)
浣犲ソ world

编码解码方式2:

>>> s = '你好 world' 
>>> b = bytes(s,'gbk')
>>> print (b) 
b'xc4xe3xbaxc3 world'
>>> s = str(b,'gbk')
>>> print (s)
你好 world


>>> s = '你好 world' 
>>> b = bytes(s,'utf8') 
>>> print (b)
b'xe4xbdxa0xe5xa5xbd world'
>>> s = str(b,'utf8')
>>> print (s)
你好 world
>>> s = str(b,'gbk') 
>>> print (s)
浣犲ソ world