use utf8

[oracle@oadb utf-8]$ cat a1.pl 
my $str="中国";
print length($str);
print "
";
print $str."
";
[oracle@oadb utf-8]$ perl a1.pl 
6
中国

此时长度是6


[oracle@oadb utf-8]$ cat a1.pl 
use utf8;
my $str="中国";
print length($str);
print "
";
print $str."
";
[oracle@oadb utf-8]$ perl a1.pl 
2
Wide character in print at a1.pl line 5.
中国

此时启用utf8 长度为2


[oracle@oadb utf-8]$ cat a1.pl 
use utf8;
use Encode;
my $str="中国";
   $str=encode_utf8($str);
print length($str);
print "
";
print $str."
";
[oracle@oadb utf-8]$ perl a1.pl 
6
中国
此时长度为6

常用中文字符用utf-8编码占用3个字节(大约2万多字),但超大字符集中的更大多数汉字要占4个字节(在unicode编码体系中,U+20000开始有5万多汉字)

字节 -> decode ->字符串 ->encode ->字节 

字符串长度短,字节长度长

原文地址:https://www.cnblogs.com/hzcya1995/p/13349831.html