go 中的rune 与 byte

概述

byte 等同于int8，常用来处理ascii字符，注重 raw data
rune 等同于int32,常用来处理unicode或utf-8字符

Unicode。它是ASCII的超集，ASCII只能表示少部分符号，随着互联网接入的国家越多，ASCII已经无法容纳个国家地区的符号文字了，所以Unicode诞生了。也就是UTF-8，UTF-8在1至4个字节之间对所有Unicode进行编码，其中1个字节用于ASCII，其余部分用于符文

在处理普通字符（如，英文字母，数字）时，rune 和 byte 并无差别。

	s := "abc123"
	b := []byte(s)
	fmt.Printf("abc123 convert to []byte is %v 
",b)
	r := []rune(s)
	fmt.Printf("abc123 convert to []rune is %v 
",r)

output

abc123 convert to []byte is [97 98 99 49 50 51] 
abc123 convert to []rune is [97 98 99 49 50 51]

但在处理特殊字符时（如中文），byte 三个单位存储一个汉字，而 rune，一个单位存储一个汉字。

一个汉字为3字节

	s := "测试"
	b := []byte(s)
	fmt.Printf("测试 convert to []byte is %v 
",b)
	r := []rune(s)
	fmt.Printf("测试 convert to []rune is %v 
",r)

测试 convert to []byte is [230 181 139 232 175 149] 
测试 convert to []rune is [27979 35797]

why

先看源码：

// byte is an alias for uint8 and is equivalent to uint8 in all ways. It is
// used, by convention, to distinguish byte values from 8-bit unsigned
// integer values.
type byte = uint8

// rune is an alias for int32 and is equivalent to int32 in all ways. It is
// used, by convention, to distinguish character values from integer values.
type rune = int32

byte 表示一字节，而 rune 表示四字节，这也解释了，双方存储汉字时的差异。

在用string ，[]byte, []rune处理汉字时也有不同：

c := "一二三四五"
c1 :=c[:2]
fmt.Printf("c1 is %v
",c1)
bc1 :=[]byte(c)[:6]
fmt.Printf("c1 with []byte is %v
",string(bc1))
rc1 := []rune(c)[:2]
fmt.Printf("c1 with []rune is %v
",string(rc1))

c1 is ��
c1 with []byte is 一二
c1 with []rune is 一二

截取中文字符串切片时，不能直接对string切片截取，最好转换成rune切片。