day02-字符及字符编码

一，字符串
　　特点：
　　　　不可变型，有一个值，有序<可通过下标获取值>
　　　　不可变型的表现，如下代码：根据day01的介绍，变量内容改变，变量ID跟着改变，称之为可变类型。　　　　

>>> word = 'test'///定义变量
>>> print(id(word))///打印ID
2412776626920
>>> word = 'test0001'///修改变量内容
>>> print(id(word))///打印ID
2412776626920

View Code

　　功能：
　　　　切片：print(word.split())///rsplit，字符串切割，默认是根据空格，从左到右。 print(word.partition(' '))返回的是元组。

　　　　　　　print(word.splitlines())按照行(' ', ' ', ')分隔，返回一个包含各行作为元素的列表，如果参数 keepends 为 False，不包含换行符，如果为 True，则保留换行符。
　　　　如下代码：主要关注，a:使用split切割出来的类型为列表。b:split带两个参数，其中第一个为’根据什么切割‘<需要用单引号引起来>，第二个为从左到右切割到第几个

>>> word = 'test'///定义变量
>>> print(word.split())///默认根据空格切割，切割类型为列表。
['test']
>>> print(word.split('t'))///指定t进行切割
['', 'es', '']
>>> print(word.split('t',1))///指定切割范围
['', 'est']
>>> print(word.rsplit('t',1))///rsplit==right split
['tes', '']
>>> word = 'hello world'///定义变量
>>> print(word.partition(' '))///字符串切割，返回三个元组，第一个为分隔符左边的子串，第二个为分隔符本身，第三个为分隔符右边的子串。
('hello', ' ', 'world')
>>> word = 'hello world'///定义变量
>>> print(word.rpartition(' '))///从右边开始字符串切割，返回三个元组，第一个为分隔符左边的子串，第二个为分隔符本身，第三个为分隔符右边的子串。
('hello', ' ', 'world')
>>> word = 'hello world 
 hello2 world 
 hello3 world 
 hello4 world'///定义变量
>>> print(word.splitlines())///按照行('
', '
', 
')分隔，返回一个包含各行作为元素的列表，如果参数 keepends 为 False，不包含换行符，如果为 True，则保留换行符。
['hello world ', ' hello2 world ', ' hello3 world ', ' hello4 world']

View Code

　　　　　移除：print(word.strip())///lstrip///rstrip，移除行首及行尾的字符，默认是空格。如下代码：

>>> word = '  test 01  '///定义变量，前后都有空格
>>> print(word.strip())//默认去除前后空格
test 01
>>> print(word.rstrip())///rstrip去除右边空格r==right
  test 01
>>> print(word.lstrip())///lstrip去除左边空格l=left
test 01
>>> word = 'test'///定义变量，前后为字符
>>> print(word.strip('t'))///指定去除字符
es
>>> print(word.lstrip('t'))///left strip
est
>>> print(word.rstrip('t'))///right strip
tes

View Code

　　　　字符填充：print(word.center(19,'*'))，，print(word.ljust(20,'0'))，print(word.rjust(20,'0'))如下代码：

>>> word = 'test'///定义变量
>>> print(word.center(19,'*'))///填充字符，第一个参数表示一共多少个字符，第二个参数表示需要填充什么字符
********test*******
>>> word = 'hello world'///定义变量
>>> print(word.ljust(20,'0'))///填充字符，第一个参数表示一共多少个字符，第二个参数表示需要填充什么字符，left just
hello world000000000
>>> print(word.rjust(20,'0'))///填充字符，第一个参数表示一共多少个字符，第二个参数表示需要填充什么字符，right just
000000000hello world
>>> word = 'hello world'///定义变量
>>> print(word.zfill(20))///填充字符，原字符串右对齐，前面填充0。
000000000hello world

View Code

　　　　字符统计：print(word.count('t'))如下代码：

>>> word = 'test001 t3'///定义变量
>>> print(word.count('t'))///统计字符出现的个数，默认是整个字符串
3
>>> print(word.count('t',2))///统计字符出现的个数，从第二个字符开始至整个字符串结束。
2
>>> print(word.count('t',2,7))///统计字符出现的个数，从第二个字符开始至第七个字符串束。
1

View Code

　　　　字符编码查看：print(word.encode())

　　　　检测字符串结尾：print(word.endswith('3'))

>>> word = 'test001 t3'///定义变量
>>> print(word.endswith('3'))///检测字符串是否以3结尾
True
>>> print(word.endswith('t'))///检测字符串是否以t结尾
False
>>> print(word.endswith('0',1,5))///检查1到5的字符串是否以0结尾
True

View Code

　　　　首字母大写：print(word.capitalize())

　　　　大写转换为小写：print(word.lower())

　　　　小写转换为大写：print(word.upper())

　　　　字母互转：print(word.swapcase())即大写转换为小写，小写转换为大写。

　　　　转换标题为大写：print(word.title())即每个单词第一个字母为大写，其余为小写。

　　　　tab转换空格：print(word.expandtabs())

word = 'test001 	 t3'///定义变量
print(word.expandtabs())///将字符串中的tab符	转换为8个空格，

View Code

　　　　查找字符串，并返回下标：print(word.find('e'))///print(word.index('t'))，print(word.index('t',2))

>>> word = 'test001 t3'///定义变量
>>> print(word.find('e'))///查找字符串第一次出现的位置，并返回对应下标
1
>>> print(word.find('a'))///查找字符串第一次出现的位置，不存在返回-1
-1

>>> word = 'hello world'///定义变量
>>> print(word.rfind('e',2))///从右边起，查找字符串第一次出现的位置，并返回对应下标
-1
>>> print(word.rfind('e',1))///从右边起，查找字符串第一次出现的位置，不存在返回-1
1

>>> word = 'test001 t3'///定义变量
>>> print(word.index('t'))///查找字符串第一次出现的位置，并返回对应下标
0
>>> print(word.index('t',2))///查找字符串第一次出现的位置，并返回对应下标
3
>>> print(word.index('t',2,3))///查找字符串第一次出现的位置，不存在报错
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found

>>> word = 'hello world'///定义变量
>>> print(word.rindex('e',1))///从右边起，查找字符串第一次出现的位置，并返回对应下标
1
>>> print(word.rindex('e',2))///从右边起，查找字符串第一次出现的位置，不存在报错
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found
>>>

View Code

　　　　字符串格式化：print('word is {1}'.format('test001'))

>>> word = 'test001 t3'///定义变量
>>> print('the first word is {1},the second word is {0}'.format(*word))///字符串格式化，注意他是拆分了字符按照下标做的格式化
the first word is e,the second word is t
>>> print('the first word is {1},the second word is {0}'.format('test001','t3'))///字符串格式化
the first word is t3,the second word is test001

View Code

　　　　判断字符串是否由字母和数字组成：print(word.isalnum())，是返回true，否返回false。

　　　　判断字符串是否仅由字母组成：print(word.isalpha())，是返回true，否返回false。

　　　　判断字符串是否仅由十进制数字组成：print(word.isdecimal())，是返回true，否返回false。

　　　　判断字符串是否仅由数字组成：print(word.isdigit())，是返回true，否返回false。

　　　　判断字符串是否存在标识符，即是否由大小写字母开头：print(word.isidentifier())，是返回true，否返回false。

　　　　判断字符串是否都是由小写组成：print(word.islower())，是返回true，否返回false。

　　　　判断字符串是否仅由数字组成：print(word.isnumeric())，是返回true，否返回false。

　　　　判断字符串是否为可打印字符串：print(word.isprintable())，是返回true，否返回false。

　　　　判断字符串是否仅由空格组成：print(word.isspace())，是返回true，否返回false。

　　　　判断字符串中所有的单词拼写首字母为大写，且其他字母为小写：print(word.istitle())，是返回true，否返回false。

　　　　判断字符串中所有的字母是否都为大写：print(word.isupper())，是返回true，否返回false。

　　　　判断字符串是否由子字母开头：print(word.startswith('he',2,5))，是返回true，否返回false。

　　　　字符串拼接：print(':'.join(word))

>>> word = ['hello','world']///定义变量为列表
>>> print(':'.join(word))///使用：分割并连接word变量
hello:world
>>> word2 = 'hello world'///定义变量为字符串
>>> print(':'.join(word2))///使用：分割并连接word变量
h:e:l:l:o: :w:o:r:l:d
>>> word3 = ('hello','world')///定义变量为元组
>>> print(':'.join(word3))///使用：分割并连接word变量
hello:world
>>> word4 = {'hello':1,'world':2}///定义变量为字典
>>> print(':'.join(word4))///使用：分割并连接word变量
hello:world

View Code

　　　　字符串替换： print(word.replace('l','new',2))

>>> word = 'hello world'///定义变量
>>> print(word.replace('l','new',2))///字符串替换，第一个参数为需要替换的字符串，第二个参数为目标字符串，第三个参数为最大替换次数
henewnewo world

View Code

二，字典
　　特点：可变型，有多个值，无序<不可通过下标获取值>

　　　可变性的表现，值改变，ID不改变，如下：

>>> word = {'name':'test01','passwd':'passwd01'}///定义字典
>>> print(id(word))///打印ID值
2412776609976
>>> word = {'sex':'female','hobby':'miusc'}///定义字典
>>> print(id(word))///打印ID值
2412776610552

View Code

　　功能：

　　　　以列表返回一个字典所有的键：print(word.keys())

　　　　以列表返回一个字典的所有值：print(word.values())

　　　　返回可遍历的(键, 值) 元组数组：print(word.items())

>>> word = {'name':'test01','passwd':'passwd01'}///定义字典
>>> print(word.items())///获取元组数组
dict_items([('name', 'test01'), ('passwd', 'passwd01')])

View Code

　　　　删除字典给定键 key 所对应的值，返回值为被删除的值。key值必须给出。否则，返回default值：print(word.pop('name'))

　　　　删除字典中的所有项目(清空字典)：print(word.clear())

　　　　随机返回并删除字典中的一对键和值：print(word.popitem())

　　　　继承字典：word2 = word2.fromkeys(word)

>>> word2 = {}///创建一个新字典
>>> word = {'name':'test01','passwd':'passwd01'}///定义字典
>>> word2 = word2.fromkeys(word)///以序列中元素做字典的键，值为默认
>>> print(word2)
{'name': None, 'passwd': None}
>>> word2 = word2.fromkeys(word,'testtest')///以序列中元素做字典的键，值为自定义
>>> print(word2)
{'name': 'testtest', 'passwd': 'testtest'}

View Code

　　　　获取字典中的值：print(word.get('name')),如果字典中包含有给定键，则返回该键对应的值，否则返回为该键设置的值。

>>> word = {'name':'test01','passwd':'passwd01'}///创建一个新字典
>>> print(word.get('na','test'))///返回指定键的值，如果值不在字典中返回指定值
test
>>> print(word.get('na'))///返回指定键的值，如果值不在字典中返回默认none
None
>>> print(word.get('name'))///返回指定键的值，
test01
>>> print(word.get('name','test'))///返回指定键的值，，如果值不在字典中返回指定值
test01
>>> word = {'name':'test01','passwd':'passwd01'}///创建一个新字典
>>> print(word.setdefault('name'))///返回指定键的值，
test01
>>> print(word.setdefault('na','test'))///返回指定键的值，，如果值不在字典中返回指定值
test

View Code

　　　　将一个字典中的值更新到另一个字典：存在则覆盖，不存在则追加

>>> word = {'name':'test01','sex':'man'}///定义字典
>>> word2 = {'sex':'female','hobby':'miusc'}///定义字典
>>> word.update(word2)
>>> print(word)
{'name': 'test01', 'sex': 'female', 'hobby': 'miusc'}

View Code

三，列表
　　特点：可变型，有多个值，有序<可通过下标获取值，下标从0开始>

>>> word = ['test01','test02','test03']///定义列表
>>> print(id(word))///打印ID值
2412776632712
>>> word = ['passwd01','passwd02','passwd03']///定义列表
>>> print(id(word))///打印ID值
2412776644680

View Code

　　功能：

　　　　删除列表的最后一个值并返回：print(word.pop())

　　　　拷贝列表：print(word.copy())

　　　　清空列表：print(word.clear())

　　　　删除列表中指定的值：word.remove('test02')

　　　　返回指定字符串的个数：print(word.count('test01'))

　　　　列表反转：word.reverse()

　　　　列表排序：word.sort()

　　　　查找字符串第一次出现的位置，并返回下标值：print(word.index('test02'))

　　　　列表追加：word.append('test04')，word.extend('test04')，word.insert(2,'test04'）

>>> word = ['test01','test02','test03']///定义列表
>>> word.append('test04')///在末尾添加新对象，以字符加入
>>> print(word)
['test01', 'test02', 'test03', 'test04']
>>> word.extend('test04')///在末尾添加新对象，以切割后加入
>>> print(word)
['test01', 'test02', 'test03', 'test04', 't', 'e', 's', 't', '0', '4']
>>> word = ['test01','test02','test03']///定义列表
>>> word.insert(2,'test04')///在指定位置添加新对象
>>> print(word)
['test01', 'test02', 'test04', 'test03']

View Code

四，元组
　　特点：不可变型，有多个值，有序<可通过下标获取值，下标从0开始>

　　功能：
　　　　查找字符并返回对应的下标：print(word.index('test02'))
　　　　统计字符串出现的次数：print(word.count('test02'))
五，集合：

　　特点：不可变型，有多个值，有序<可通过下标获取值，下标从0开始>

　　功能：

　　　　从右边开始删除集合中的元素并返回值：word2 = word.pop()

　　　　删除集合中指定字符串：print(word.remove('test01'))

　　　　清空集合中的元素：print(word.clear())

　　　　复制集合中的元素到新的集合：word2 = word.copy()

　　　　把字典test01的键/值对更新到word里：print(word.update('test01'))，注意：字符串会按照分割的形式更新为对象。

　　　　添加到末尾：word.add('test04')

五，各数据类型特点比较

　　数字

　　字符串

　　集合：无序，即无序存索引相关信息

　　元组：有序，需要存索引相关信息，不可变

　　列表：有序，需要存索引相关信息，可变，需要处理数据的增删改

　　字典：无序，需要存key与value映射的相关信息，可变，需要处理数据的增删改

六，字符编码

　　字符编码类型：

　　　　ASCII：最早计算机是由美国发明的，所以计算机内部，只有识别英文和计算机特殊符号的字符编码。这些编码共占用8位，1个字节，一共有2的8次方种变化，即256-1=255种变化，也就足够使用了。

　　　　GBK/GB2312：计算机发展到中国，乃至世界之后。对应各国，也都有了自己的编码格式。中国的为GBK/GB2312。中国文化博大精深，中国汉字同样。对于汉字而言，255种变化已经远远不能满足日常汉字的需要。因此衍生出了对应于中文的字符编码，占用16位，2个字节表示1个字。一共有2的16次方种变化，即65535-1=65534种变化。

　　　　Unicode：计算机继续发展后，各国有自己的字符编码，解决了各自的需求问题。但是此时衍生出的另一个需求是，如何能够识别各国的编码？一个中国人写的日文编程使用ASCII或GBK都无法正确识别。此时就出现了万国码即Unicode的概念。万国码使用16位，2个字节表示一个字。

　　　　UTF-8：Unicode的缩写版，对于Unicode而言，英文字符也是占用2个字节，一定程度上，浪费了空间。UTF-8使用8位1个字节表示英文字符，一定程度上，节省了空间使用。

　　文件写入流程：

　　　　1，启动编辑器

　　　　2，写入文件内容，首先写在内存上。<此时，没有文件编码的概念，对于操作系统而已，写入的是一堆符号>

　　　　3，ctrl+s后保存到硬盘中。<此时，有了文件编码的概念，文件写入时，文件编辑器提供的编码为文件的实际编码>

　　　　4，硬盘如何识别写入的编码？linux操作系统引入了Unicode编码，此时硬盘存入的编码即为编辑器指定的编码。

　　文件读取流程：

　　　　1，启动编辑器。

　　　　2，从硬盘中按照指定的编码格式读出文件。

　　　　3，将文件读入内存中。

　　　　4，将文件按照一定的编码格式显示在屏幕上<此时要确保读出的字符编码等同于写入的字符编码，否则会出现乱码的情况>。
七，文件处理
　　读文件：

list = open('userlist','r',encoding='utf-8')///打开文件，得到文件句柄并赋值给一个变量
list_t = list.read()///通过句柄对文件进行操作
list.close()///关闭文件
print(list_t)

View Code

　　list = open('userlist','r',encoding='utf-8')
　　　1、由应用程序向操作系统发起系统调用open(...)
　　　2、操作系统打开该文件，并返回一个文件句柄给应用程序
　　　3、应用程序将文件句柄赋值给变量list　

　　写文件：

list = open('userlist','r',encoding='utf-8')///打开文件，得到文件句柄并赋值给一个变量
list_t = list.read()///通过句柄对文件进行操作
list.close()///关闭文件
print(list_t)
with open('userlist','r') as read_f, open('userlist','w') as write_f:///使用with打开文件并赋予读写权限
    data = read_f.read()///with读文件
    write_f.write('please')///with写文件，注意这个地方会清空写入内容
with open('userlist','a') as write_a:///使用a追加写。
    write_a.write('double please')

View Code

　　操作文件：

　　读取文件内容，光标移动到文件末尾：list_t = list.read()

　　读取一行内容，光标移动到第二行首部：list_t = list.readline()

　　读取每一行内容,存放于列表中：list_t = list.readlines()

　　针对文本模式的写,需要自己写换行符：list_t = list.write('testr testp ')

　　判断文件是否可读：print(list.readable())，是返回true，否返回false。

　　判断文件是否可写：print(list.writable())，是返回true，否返回false。