Python 和perl 中的字节和字符

[oracle@node01 python]$ cat t4.py 
# -*- coding: utf-8 -*-
a='中国'
b=u'中国'
print a
print len(a)
print b
print len(b)
[oracle@node01 python]$ python t4.py 
中国
6
中国
2


>>> a='中国'
>>> a
'xe4xb8xadxe5x9bxbd'
>>> print a
中国


>>> b=u'中国'
>>> b
u'u4e2du56fd'
>>> print b
中国


字节转换成字符:
>>> print 'xe4xb8xadxe5x9bxbd'.decode('utf8')
中国

字符转字节:
>>> print u'u4e2du56fd'.encode('utf8')
中国



perl：

[oracle@node01 python]$ cat t1.pl
my $a='中国';
print length($a);
print "
";
[oracle@node01 python]$ perl t1.pl
6

[oracle@node01 python]$ cat t2.pl
use utf8;
my $a='中国';
print length($a);
print "
";
[oracle@node01 python]$ perl t2.pl
2



[oracle@node01 python]$ cat t3.pl
use Encode;
my $a='中国';
print length($a);
print "
";
my $b=decode_utf8('中国');
print length($b);
print "
";
[oracle@node01 python]$ perl t3.pl
6
2

decode_utf8  

 $string = decode_utf8($octets [, CHECK]);

字节解码成字符:

等价于  $string = decode("utf8", $octets [, CHECK]). 



字节顺序表示的$octets 是从(loose 不严格的)utf8 解码成逻辑字符。

因为不是所有的字节顺序是正确的

[oracle@node01 python]$ cat t3.pl
use Encode;
my $a='中国';
print length($a);
print "
";
my $b=decode_utf8('中国');
print length($b);
print "
";
my $c=encode_utf8($b);
print length($c);
print "
";
[oracle@node01 python]$ perl t3.pl
6
2
6

Equivalent to $octets = encode("utf8", $string)

字符$string 是编程成Perl的内部格式,结果是返回字节