[python][LXF][Notes]文件读写及序列化

reference website:

文件读写

https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/001431917715991ef1ebc19d15a4afdace1169a464eecc2000

python 帮助文档：

    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================

'r': 读

'w':写

'a':追加

'r+' == r+w（可读可写（从头覆盖写，新写了多少字节就从头覆盖多少字节），文件若不存在就报错(IOError)）

'w+' == w+r（可读可写（覆盖写，不管文件里面有什么都删了重写），文件若不存在就创建）

'a+' ==a+r（可追加可写（在最后追加），文件若不存在就创建）

1.读文件

#path = 文件名(相对路径/绝对路径)

path = 'D:\Python\workspace\test.txt'

1 >>> f = open(path,'r')
2 >>> f.read()
3 'Hello
World!'
4 >>> f.close()

但是如果打开文件失败，后面的f.close()就不会调用。需要用with语句来实现。

>>> with open(path,'r') as f:
    print(f.read())

Hello
World!

这种方式和try...finally（本文省略）是一样的，用with语句代码更简洁，且不必调用f.close()

read() #一次性读取文件全部内容

read(size) #每次最多读取size个字节的内容

readline() #每次读取一行内容

readlines() #一次读取所有内容并返回list

>>> with open(path,'r') as f:
    print(f.read(3))

Hel

>>> with open(path,'r') as f:
    print(f.readline())

Hello

>>> with open(path,'r') as f:
    print(f.readlines())
    
['Hello
', 'World!']

前面讲的默认都是读取文本文件，并且是UTF-8编码的文本文件。要读取二进制文件，比如图片、视频等等，用'rb'模式打开文件即可。

2.写文件

>>> with open(path,'w') as f:
    f.write('Hello
World!')

in test.txt: (会读取转义字符)

Hello
World!

>>> with open(path,'a') as f:
    f.write(r'
welcome!')

'a'以追加的方式写。字符串前面加'r'，不会读取转义字符。

so, in test.txt:

Hello
World! welcome!

>>> with open('test_create.txt','x') as f_new:
    f_new.write('test para x')

'x' 创建一个新file，只能写（不能读）。

序列化

reference website:

https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/00143192607210600a668b5112e4a979dd20e4661cc9c97000

3.1 import pickle

用pickle.dumps()把任意对象序列化成一个二进制bytes。或者用pick.dump(object,file)直接把对象序列化以后写入一个file-like object:(这里一个是dump一个是dumps)

>>> d = dict(name='Chris',age=18, score=100)
>>> pickle.dumps(d)
b'x80x03}qx00(Xx04x00x00x00nameqx01Xx05x00x00x00Chrisqx02Xx05x00x00x00scoreqx03KdXx03x00x00x00ageqx04Kx12u.'

>>> f = open('output.txt','xb')
>>> pickle.dump(d,f)
>>> f.close()

必须以二进制方式打开，否则会报错：

>>> f = open('output.txt','w')
>>> pickle.dump(d,f)
Traceback (most recent call last):
  File "<pyshell#139>", line 1, in <module>
    pickle.dump(d,f)
TypeError: write() argument must be str, not bytes

反序列化：

pickle.load()

>>> f = open('output.txt','rb')
>>> d = pickle.load(f)
>>> f.close()
>>> d
{'name': 'Chris', 'score': 100, 'age': 18}

pickle.loads()

>>> pickle_bytes = pickle.dumps(d)
>>> pickle.loads(pickle_bytes)
{'name': 'Chris', 'score': 100, 'age': 18}

3.2 import json

方便在不同的编程语言传递对象。JSON表示出来的是一个字符串，可以被所有语言读取。JSON是一种标准格式，比XML更快，可以直接在Web页面中读取。

JSON表示的对象就是标准的JavaScript语言的对象，JSON和Python内置的数据类型对应如下：

JSON类型	Python类型
{}	dict
[]	list
"string"	str
1234.56	int或float
true/false	True/False
null	None

>>> import json
>>> json.dumps(d)
'{"name": "Chris", "score": 100, "age": 18}'
>>> type(json.dumps(d))
<class 'str'>

可以和pickle.dumps()的结果对比：

pickle.dumps()序列化出来的是自然语言不可读的一串二进制(bytes).

json.dumps() 返回一个str，内容就是标准的JSON。类似的，dump()方法可以直接把JSON写入一个file-like object。

反序列化：

用json.loads()方法把JSON的字符串反序列化或者对应的load()方法从一个file-like object中读取字符串并且反序列化：

>>> json_str = json.dumps(d)
>>> json.loads(json_str)
{'name': 'Chris', 'score': 100, 'age': 18}

JSON的dumps()方法提供了很多可选参数来定制JSON序列化。

可选参数default就是把任意一个对象编程可序列为JSON的对象。

举个栗子：

 1 class Student(object): #class Student
 2     def __init__(self, name, age, score):
 3         self.name = name
 4         self.age = age
 5         self.score = score
 6 
 7 s = Student('Chris', 18, 100) #实例一个Student类的对象s
 8 
 9 def student2dict(std): #把class Student转换为一个可序列为JSON的对象
10     return {
11         'name': std.name,
12         'age': std.age,
13         'score': std.score
14     }
15 
16 #Student先被student2dict转换为一个dict，然后再顺利序列化为JSON
17 print(json.dumps(s, default=student2dict)) 
18 #任意class都有__dict__属性，它就是一个dict，所以可以直接这样用
19 print(json.dumps(s, default=lambda obj: obj.__dict__))

output:

{"age": 18, "score": 100, "name": "Chris"}
{"age": 18, "score": 100, "name": "Chris"}

同样的反序列化，也要有一个转换函数负责把dict转换为Student的实例：

1 def dict2student(d):
2     return Student(d['name'], d['age'], d['score'])
3 
4 json_str = json.dumps(s, default=lambda obj: obj.__dict__)
5 #loads()方法首先转换出一个dict对象，然后通过传入的object_hook函数，把dict转换为Student实例
6 print(json.loads(json_str, object_hook=dict2student))

output：

<__main__.Student object at 0x00000000026B0EB8>