Python 中字节和字符(unicode)

Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (
Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

输入:

现在,你已经可以用print 输出你想要的结果了。但是,如果要让用户从电脑输入一些字符怎么办?

Python提供了一个raw_input,可以让用户输入字符串,并存放到一个变量里。

比如输入用户的名字:

name=raw_input("what is your name?")
print name;

# print absolute value of an integer:
a = 100
if a >= 0:
    print a
else:
    print -a


字符编码:

Python 的字符串:

搞清楚了令人头疼的字符编码问题后,我们再来研究Python对Unicode支持:

Python提供了ord()和chr()函数，可以把字母和对应的数字相互转换：


print  ord('A')

C:Python27python.exe C:/Users/TLCB/PycharmProjects/untitled/t1.py
65

Python在后来添加了对Unicode的支持，以Unicode表示的字符串用u'...'表示，比如：



在编写Python时，当使用中文输出或注释时，运行脚本会提示错误信息：

SyntaxError: Non-ASCII character 'xe5' in file *******

解决方法：

python的默认编码文件是用的ASCII码，你将文件存成了UTF-8，解决办法很简单，在文件开头加入

# -*- coding: UTF-8 -*-    或者  #coding=utf-8



u767bu5f55u6210u529f  unicode转成中文 登录成功

u4e2d 转中文 中

>>> u'中'
# c:python27libencodingsutf_32_be.pyc matches c:python27libencodingsutf_
32_be.py
import encodings.utf_32_be # precompiled from c:python27libencodingsutf_32_b
e.pyc
u'u4e2d'


# -*- coding: UTF-8 -*-
print len('中文')

C:Python27python.exe C:/Users/TLCB/PycharmProjects/untitled/t1.py
6

中文长度为6表示 

类似PERL

可以看到,utf8 flag打开的时候,"中国"被当成utf8 字符串处理,所以其长度为2.  
  
utf8 flag关闭的时候,"中国"被当成octets(字节数组)处理,出来的长度为6  

# -*- coding: UTF-8 -*-
print len(u'中文')


C:Python27python.exe C:/Users/TLCB/PycharmProjects/untitled/t1.py
2


perl 里:

utf8:

[root@node01 go]# cat a1.pl 
my $a="中文";
print length($a);
print "
";
[root@node01 go]# perl a1.pl
6

utf-8的中文字符是占三个字节的


GBK:

[root@node01 ~]# cat a1.pl 
my $a="中文";
print length($a);
print "
";
[root@node01 ~]# perl a1.pl
4



utf8:

[root@node01 go]# cat a1.pl
use Encode;
my $a="中文";
 $a=decode_utf8($a);
print length($a);
print "
";

[root@node01 go]# perl a1.pl
2

utf8 中文只占2个字符







# -*- coding: utf-8 -*-
print len('中文')


C:Python27python.exe C:/Users/TLCB/PycharmProjects/untitled/t1.py
6


# -*- coding: utf-8 -*-
print len('中文'.decode('utf-8'))

print '中文'.decode('utf-8')

C:Python27python.exe C:/Users/TLCB/PycharmProjects/untitled/t1.py
2
中文


在python中，使用unicode类型作为编码的基础类型。即

 decode              encode

bytes ------> str(unicode)------>bytes



字节解码为字符,字符编码为字节


python中，我们使用decode()和encode()来进行解码和编码


encode 字符变字节

decode 字节变字符


perl 里的 字节,字符关系:

[root@node01 perl]# cat t1.pl 
use Net::SMTP;
use LWP::UserAgent;
use HTTP::Cookies;
use HTTP::Headers;
use HTTP::Response;
use Encode;
use JSON;
use File::Temp qw/tempfile/;
use HTTP::Date qw(time2iso str2time time2iso time2isoz);
use Data::Dumper;
my $CurrTime = time2iso(time());
my $dis_mainpublish='中均资本';
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
my $now          = time();
$ua->agent('Mozilla/5.0');
my $cookie_jar = HTTP::Cookies->new(

    file           => 'lwp_cookies.txt',
    autosave       => 1,
    ignore_discard => 1
);
$ua->cookie_jar($cookie_jar);
my $response = $ua->get("https://www.zjcap.cn/web/noauth?method=%2Fproduct%2Flist&duration=&entryUnit=&productType=1&status=&yield=&productName=&pageNum=1&pageSize=6&_=1468156037154");


if ($response->is_success) {
  $r = $response->decoded_content;   
  print "
";
  }
else 
  {
  die $response->status_line;
  };
#my $r=encode_utf8($r);
my $hash = decode_json ( $r );
print "产品列表
";
print $hash->{data}->{dataList}->[0]->{name};
print "
";
print length($hash->{data}->{dataList}->[0]->{name});
print "
";
[root@node01 perl]# perl t1.pl 

产品列表
Wide character in print at t1.pl line 39.
中均-至信230号
9


my $a=encode_utf8($hash->{data}->{dataList}->[0]->{name});
print "
";
print length($a);
print "
";

[root@node01 perl]# perl t1.pl 

产品列表
Wide character in print at t1.pl line 39.
中均-至信230号
9

19



u4e2d unicode 转换为中文 为中