python文件

文件迭代器是最好的读取工具，从文本文件读取文字的最佳方式就是根本不要读取该文件

从文件读取的数据回到脚本是一个字符串。

close是通常选项。调用close会终止外部文件的连接。

文件总是缓冲并且是可查的

写进文件

myfile = open('myfile.txt', 'w')
myfile.write('hello textfile ')
myfile.write('goodbye text file ')
myfile.close()

读取文件

myfile = open('myfile.txt')
print(myfile.readline())
print(myfile.readline())
print(myfile.readline())

hello textfile

goodbye text file

print(open('myfile.txt').read())

hello textfile

goodbye text file

文件迭代器往往是最佳选择

for line in open('myfile.txt'):
print(line,end='')

hello textfile

goodbye text file

python3

文本文件内容为常规的字符串，自动执行Unicode编码和解码，默认行末换行。

二进制文件为一个特殊的bytes字符串

python2

文本文件处理8位文本和二进制数据，有特殊的字符串类来处理unicodewenben

python3中的区别源自于简单文本和unicode文本并为一种常规的字符串

因为所有的文本都是unicode，包括ascii和其他8位编码

文件中处理解析python对象

x, y, z = 43, 44, 45
s = 'spam'
d = {'a':1, 'b':2}
l = [1, 2, 3]
f = open('datafile.txt','w')
f.write(s +' ')
f.write('%s,%s,%s ' % (x, y, z))
f.write(str(l) +'$' +str(d) + ' ')
f.close()

chars = open('datafile.txt').read()
print(chars)

spam

43,44,45

[1, 2, 3]${'a': 1, 'b': 2}

f = open('datafile.txt')
line = f.readline()
print(line)

spam

line.rstrip()
print(line)

spam

line = f.readline()
print(line)

43,44,45

parts = line.split(',')
print(parts)

['43', '44', '45 ']

print(int(parts[1])) # 44
numbers = [int(p) for p in parts]
print(numbers) # [43, 44, 45]

int和一些其他的转换方法会忽略旁边的空白

line = f.readline()
print(line) # [1, 2, 3]${'a': 1, 'b': 2}
parts = line.split('$')
print(parts) # ['[1, 2, 3]', "{'a': 1, 'b': 2} "]
print(eval(parts[0])) # [1, 2, 3]
obj = [eval(p) for p in parts]
print(obj) # [[1, 2, 3], {'a': 1, 'b': 2}]

用pickle存储python原生对象

d = {'a':1, 'b':2}
f = open('datafile.pkl','wb')
import pickle
pickle.dump(d,f)
f.close()
f = open('datafile.pkl','rb')
e = pickle.load(f)
print(e) # {'a': 1, 'b': 2}
print(open('datafile.pkl','rb').read())

b'x80x03}qx00(Xx01x00x00x00aqx01Kx01Xx01x00x00x00bqx02Kx02u.'

文件中打包二进制数据的存储于解析

struct模块能够构造和解析打包的二进制数据

要生成一个打包的二进制数据文件，用wb模式打开它并将一个格式化字符串和几个python

对象传给struct，这里用的格式化字符串指一个4字节整数，一个包含4字符的字符串

以及一个二位整数的数据包。这些都是按照高位在前的形式

f = open('data.bin','wb')
import struct
data = struct.pack('>i4sh',7,b'spam',8)
print(data)
f.write(data)
f.close()

f = open('data.bin', 'rb')

data = f.read()

print(data)

values = struct.unpack('>i4sh',data)
print(values) # (7, b'spam', 8)

其他文件工具

标准流，sys模块中预先打开的文件对象如sys.stdout

os模块中的描述文件

socket。pipes。FIFO文件

通过键开存储的文件

shell流，op.popen和subprocess.Popen

重访类型分类

对象根据分类共享操作，如str，list，tuple都共享合并，长度，索引等序列操作

只有可变对象可以原处修改

文件导出的唯一方法

对象分类

对象类型分类是否可变

数字数值否

字符串序列否

列表序列是

字典映射是

元组序列否

文件拓展 N/A

sets 集合是

frozenset 集合否

bytearray 序列是

l = ['abc', [(1,2),([3],4)],5]
print(l[1]) # [(1, 2), ([3], 4)]
print(l[1][1]) # ([3], 4)
print(l[1][1][0]) # [3]

引用vs拷贝

x = [1,2,3]
l = ['a',x,'b']
print(l) # ['a', [1, 2, 3], 'b']
d = {'x':x,'y':2}
print(d) # {'x': [1, 2, 3], 'y': 2}
x[1] = 'surprise'
print(l) # ['a', [1, 'surprise', 3], 'b']
print(d) # {'x': [1, 'surprise', 3], 'y': 2}

x = [1,2,3]
l = ['a',x[:],'b']
print(l) # ['a', [1, 2, 3], 'b']
d = {'x':x[:],'y':2}
print(d) # {'x': [1, 2, 3], 'y': 2}
x[1] = 'surprise'
print(l) # ['a', [1, 2, 3], 'b']
print(d) # {'x': [1, 2, 3], 'y': 2}

import copy
l = [1,2,3]
d = {'a':1,'b':2}
e = l[:]
D = d.copy()

比较，相等性，真值

l1 = [1,2,4]
l2 = [1,2,4]
print(l1 == l2, l1 is l2) # True False

s1 = 'spam'
s2 = 'spam'
print(s1 == s2, s1 is s2) # True True
a = 'a long strings qqq'
b = 'a long strings qqq'
print(a == b, a is b) # True True ......

d1 = {'a':1,'b':2}
d2 = {'a':1,'b':3}
print(sorted(d1.items()) < sorted(d2.items())) # True
print(sorted(d1.keys()) < sorted(d2.keys())) # False
print(sorted(d1.values()) < sorted(d2.values())) # True

真值 'spam' 1

假值 '' [] {} () 0.0 None

l = [None] *4
print(l) # [None, None, None, None]
print(type([1]) == type([])) # True
print(type([1]) == list) # True
print(isinstance([1],list)) # True
import types
def f():pass
print(type(f) == types.FunctionType) # True

内置的类型陷阱

赋值生成引用，而不是拷贝

l = [1,2,3]
m = ['x',l,'y']
print(m) # ['x', [1, 2, 3], 'y']
l[1] = 0
print(m) # ['x', [1, 0, 3], 'y']

为了避免这种问题，可以用分片来生成一个高级拷贝

l = [1,2,3]
m = ['x',l[:],'y']
l[1] = 0
print(m) # ['x', [1, 2, 3], 'y']

重复能增加层次深度

l = [4,5,6]
x = l * 3
y = [l] * 3
print(x) # [4, 5, 6, 4, 5, 6, 4, 5, 6]
print(y) # [[4, 5, 6], [4, 5, 6], [4, 5, 6]]
l[1] = 0
print(x) # [4, 5, 6, 4, 5, 6, 4, 5, 6]
print(y) # [[4, 0, 6], [4, 0, 6], [4, 0, 6]]

留意循环数据结构

l = ['grail']
l.append(l)
print(l) # ['grail', [...]]

不可变类型不可再原处修改

t = (1,2,3)
t = t[:2] + (4,)
print(t) # (1, 2, 4)

python文件

文件迭代器是最好的读取工具，从文本文件读取文字的最佳方式就是根本不要读取该文件

从文件读取的数据回到脚本是一个字符串。

close是通常选项。调用close会终止外部文件的连接。

文件总是缓冲并且是可查的

写进文件

读取文件

hello textfile

goodbye text file

hello textfile

goodbye text file

文件迭代器往往是最佳选择

hello textfile

goodbye text file

python3

文本文件内容为常规的字符串，自动执行Unicode编码和解码，默认行末换行。

二进制文件为一个特殊的bytes字符串

python2

文本文件处理8位文本和二进制数据，有特殊的字符串类来处理unicodewenben

python3中的区别源自于简单文本和unicode文本并为一种常规的字符串

因为所有的文本都是unicode，包括ascii和其他8位编码

文件中处理解析python对象

spam

43,44,45

[1, 2, 3]${'a': 1, 'b': 2}

spam

spam

43,44,45

['43', '44', '45 ']

int和一些其他的转换方法会忽略旁边的空白

用pickle存储python原生对象

b'x80x03}qx00(Xx01x00x00x00aqx01Kx01Xx01x00x00x00bqx02Kx02u.'

文件中打包二进制数据的存储于解析

struct模块能够构造和解析打包的二进制数据

要生成一个打包的二进制数据文件，用wb模式打开它并将一个格式化字符串和几个python

对象传给struct，这里用的格式化字符串指一个4字节整数，一个包含4字符的字符串

以及一个二位整数的数据包。这些都是按照高位在前的形式

f = open('data.bin', 'rb')

data = f.read()

print(data)

其他文件工具

标准流，sys模块中预先打开的文件对象如sys.stdout

os模块中的描述文件

socket。pipes。FIFO文件

通过键开存储的文件

shell流，op.popen和subprocess.Popen

重访类型分类

对象根据分类共享操作，如str，list，tuple都共享合并，长度，索引等序列操作

只有可变对象可以原处修改

文件导出的唯一方法

对象分类

对象类型 分类 是否可变

数字 数值 否

字符串 序列 否

列表 序列 是

字典 映射 是

元组 序列 否

文件 拓展 N/A

sets 集合 是

frozenset 集合 否

bytearray 序列 是

引用vs拷贝

比较，相等性，真值

真值 'spam' 1

假值 '' [] {} () 0.0 None

内置的类型陷阱

赋值生成引用，而不是拷贝

为了避免这种问题，可以用分片来生成一个高级拷贝

重复能增加层次深度

留意循环数据结构

不可变类型不可再原处修改

对象类型分类是否可变

数字数值否

字符串序列否

列表序列是

字典映射是

元组序列否

文件拓展 N/A

sets 集合是

frozenset 集合否

bytearray 序列是