python文件I/O常用操作

文件I/O常用操作

open 打开
 read 读取
 write 写入
 close 关闭
 readline 行读取
 readlines 多行读取
 seek 文件指针操作
 tell 指针位置
 其他

open打开操作

打开一个文件，返回一个文件对象（流对象）和文件描述符。打开文件失败，则返回异常

基本使用：
创建一个文件test，然后打开它，用完关闭

f = open("test") #file对象
# windows <_io.TextIOWrapper name='test' mode='r' encoding='cp935'>
# linux <_io.TextIOWrapper name='test' mode='r' encoding='UTF-8'>
print(f.read()) #读取文件
f.close() #关闭文件

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

<ipython-input-1-f14d31593ce4> in <module>
----> 1 f = open("test") #file对象
      2 # windows <_io.TextIOWrapper name='test' mode='r' encoding='cp935'>
      3 # linux <_io.TextIOWrapper name='test' mode='r' encoding='UTF-8'>
      4 print(f.read()) #读取文件
      5 f.close() #关闭文件


FileNotFoundError: [Errno 2] No such file or directory: 'test'

文件操作中，最常用的操作就是读和写。
文件访问的模式由两种：文本模式和二进制模式。不同模式下，操作函数不尽相同，表现的结果也不一样。# open的参数

file

打开或创建的文件名。如果不指定路径，默认时当前路径

mode模式

r 缺省的，表示只读打开
 w 只写打开
 x 创建并写入一个新文件
 a 写入打开，如果文件存在，则追加
 b 二进制模式
 t 缺省的，文本模式
 + 读写打开一个文件。给原来只读，只写方式打开提供缺失的读或者写能力
在上面的例子中，可以看到默认时文本打开模式，且是只读的例子：

r模式

```python #r模式 f = open('test') f.read() f.write('abc') f.close() #默认是只读的，打开文件不可写 ```

---------------------------------------------------------------------------

UnsupportedOperation                      Traceback (most recent call last)

<ipython-input-2-6a7832c033e1> in <module>
      2 f = open('test')
      3 f.read()
----> 4 f.write('abc')
      5 f.close()


UnsupportedOperation: not writable

w模式

```python #w模式 f = open('test','w') f.write('好') f.close() f = open('test') f.read() #写模式打开文件，写入数据后关闭（写入方式默认会覆盖原来数据），在使用默认读模式打开，读取文件， ```

'好'

open默认是只读模式r打开已存在的文件
r
只读打开文件，如果使用write方式，会抛出异常
如果文件不存在，抛出FlieNotFoundError异常
w
表示只读方式打开，如果读取则抛出异常
如果文件不存在，则直接创建文件
如果文件存在，则清空文件内容

x模式

f = open('test2','x')
f.write('abc')
f.close()
f = open('test2')
f.read()

'abc'

f = open('test2','x')

---------------------------------------------------------------------------

FileExistsError                           Traceback (most recent call last)

<ipython-input-10-ba6940455994> in <module>
----> 1 f = open('test2','x')


FileExistsError: [Errno 17] File exists: 'test2'

x
文件不存在，创建文件，并只写方式打开
文件存在，抛出FileExistsError异常

a模式

f = open('test','a')
f.write('def')
f.close()
f = open('test')
f.read()

'好def'

文件存在，只写打开，追加内容
文件不存在，则创建后，只写打开，追加内容

r是只读，wxa都是只写
wxa都可以产生新文件，w不管文件存在与否，都会生成全新内容的文件；a不管文件是否存在都能在打开文件尾部追加；x必须要求文件事先不存在，自己造一个新文件

t模式

文本模式t
字符流，将文件的字节按照某种字符编码理解，按照字符操作。open的默认mode就是rt。

b模式

字节流，讲文件就按照字节理解，与字符编码无关。二进制模式操作时，字节操作使用bytes类型

f = open('test','rb') #二进制只读
s = f.read()
print(type(s)) #bytes
print(s)
f.close() #关闭文件
f = open('test','wb') #IO对象
s = f.write('啊哈呵'.encode())
print(s)
f.close()

<class 'bytes'>
b'xe5xa5xbddef'
9

+模式

```python # f = open('test','r_') # s = f.read() # f.write('马哥教育1') # print(f.read()) # f.close() # !cat test

f = open('test','a')
f.write('123')
print(f.read())
f.close()



    ---------------------------------------------------------------------------
    
    UnsupportedOperation                      Traceback (most recent call last)
    
    <ipython-input-53-72f745833071> in <module>
          8 f = open('test','a')
          9 f.write('123')
    ---> 10 print(f.read())
         11 f.close()


    UnsupportedOperation: not readable

### 文件指针
上面的例子中，已经说明了有一个指针文件指针，指向当前字节位置
mode = r ，指针起始在0
mode = a ，指针起始在EOF

[tell() 显示指针当前位置](#tell)
[seek(offset[,whence])移动文件指针位置](#seek)
offset偏移多少字节，whence从哪里开发。
文本模式下
whence 0 缺省值，表示从头开始，offset只能正整数
whence 1 表示从当前位置，offset只接受0
whence 2 表示从EOF位置，offset只接受0
文本模式支持从开头向后偏移的方式。
二进制模式下
whence 0 缺省值，表示从头开始，offset只能正整数
whence 1 表示从当前位置，offset可正可负
whence 2 表示从EOF位置，offset可正可负示例：
<h1 id="tell">tell()</h1>
```python
f = open('test')

f.tell() #默认读模式打开文件，指针在文件起始

f.read(1) #读取第一个文字

'马'

f.tell() #指针在第三个字节上，默认为下一个读取的起始位，中文UTF-8编码占三个字节

f.read(1) #读取第一个文字

'哥'

f.tell() #中文UTF-8编码占三个字节

f.close()

f = open('test','a+') #追加写入

f.tell() #指针默认在文件尾

!cat test

马哥1123123123123123123

f.write('你好') #添加

f.tell()

!cat test #这就是指针，注意尽量不要使用读模式写，指针在文件头部，会覆盖数据

你好1123123123123123123

!echo "马哥1123123123123123123" > test

f = open('test','r+')

f.tell()

!cat test

马哥1123123123123123123

f.write('你好')

f.tell()

!cat test

你好1123123123123123123

seek移动文件指针位置

```python #文本模式 !echo "abcdefg" > test f = open('test','r+') ```

f.tell() #起始

f.read()

'abcdefg
'

f.tell() #EOF

f.seek(0) #起始

f.read()

'abcdefg
'

f.seek(2,0)

f.read()

'cdefg
'

f.seek(2,0) #offset必须为0

f.seek(2,1) #offset必须为0

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-227-8e4bcc97d95f> in <module>
----> 1 f.seek(2,1) #offset必须为0


ValueError: I/O operation on closed file.

f.seek(2,2) #offset必须为0

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-226-8025d2a83e5a> in <module>
----> 1 f.seek(2,2) #offset必须为0


ValueError: I/O operation on closed file.

f.close()

#中文
f = open('test','w+')

f.write('你好世界')

f.tell()

!cat test

你好世界

f.close()

f = open('test','r+')

f.read(3)

'你好世'

f.seek(1)

f.tell()

f.read()

---------------------------------------------------------------------------

UnicodeDecodeError                        Traceback (most recent call last)

<ipython-input-297-571e9fb02258> in <module>
----> 1 f.read()


/usr/lib64/python3.6/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbd in position 0: invalid start byte

f.seek(2)

f.read()

---------------------------------------------------------------------------

UnicodeDecodeError                        Traceback (most recent call last)

<ipython-input-300-571e9fb02258> in <module>
----> 1 f.read()


/usr/lib64/python3.6/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte

f.seek(3)

f.read()

'好世界'

f.close()

read读取操作

read(size=-1) size表示读取的多少个字符或字节；负数或者None表示读取到EOF

f = open('test','r+',1)

f.write('hello,world')

f.write('
')

f.write('你好世界')

f.seek(0)

f.read()

'hello,world
你好世界'

f.close()

#二进制
f = open('test','rb+')

f.read(7)

b'hello,w'

f.read(1)

b''

f.close()

readline行读取

readlines多行读取

readline(size=-1) 一行行读取文件内容。size设置一次能读取行内几个字符或字节 readlines(hint=-1) 读取所有行的列表。指定hint则返回指定的行数

f = open('test','r+')

f.readline() #一行一行读取，
为换行符

'hello,world
'

f.readline() #一行一行读取，
为换行符

'你好世界'

f.readlines()

['hello,world
', '你好世界']

write写

write(s),把字符串s写入到文件种并返回字符的个数

close关闭

flush并关闭文件对象文件已关闭，再次关闭没有任何效果

其他

seekable() 是否可seek readable() 是否可读 writable() 是否可写 closed 是否已经关闭### 上下文管理1、问题的引出

lst = [] #列表生成式
for _ in range(2000): #python range() 函数可创建一个整数列表，一般用在 for 循环中
    lst.append(open('test')) #在列表中执行打开test文件

---------------------------------------------------------------------------

OSError                                   Traceback (most recent call last)

<ipython-input-387-e2de46ba0013> in <module>
      1 lst = [] #列表生成式
      2 for _ in range(2000): #python range() 函数可创建一个整数列表，一般用在 for 循环中
----> 3     lst.append(open('test')) #在列表中执行打开test文件


OSError: [Errno 24] Too many open files: 'test'

print(len(lst)) #Python len() 方法返回对象（字符、列表、元组等）长度或项目个数。

lsof列出打开的文件。没有就#yum install lsof
$ lsof -p 1399 | grep test | wc -l

!lsof -p 959 | grep test | wc -l

ulimit -a 查看所有限制。其中open files就是打开文件数的限制，默认1024

for x in lst:
    x.close()

将文件一次关闭，然后就可以继续打开了。在看一次lsof。如何解决？
1、异常处理
当出现异常的时候，拦截异常。但是，应为很多代码都可能出现OSError异常，还不好判断异常就是因为资源限制产生的。

f = open('test')
try:
    f.write('abc') #文件只读。写入失败
finally:
    f.close()#这样才行
    print(f.closed)

True



---------------------------------------------------------------------------

UnsupportedOperation                      Traceback (most recent call last)

<ipython-input-8-5115e18cccb3> in <module>
      1 f = open('test')
      2 try:
----> 3     f.write('abc') #文件只读。写入失败
      4 finally:
      5     f.close()#这样才行


UnsupportedOperation: not writable

使用finally可以保证打开的文件可以被关闭2、python上下文管理一种特殊的语法，交给解释器去释放文件对象

del f
with open('test') as f:
    f.write('abc') #文件只读，写入失败

---------------------------------------------------------------------------

UnsupportedOperation                      Traceback (most recent call last)

<ipython-input-15-db017c97c470> in <module>
      1 del f
      2 with open('test') as f:
----> 3     f.write('abc') #文件只读，写入失败
      4 
      5 #测试f是否关闭


UnsupportedOperation: not writable

#测试f是否关闭
f.closed  #f的作用域

True

1、使用with ... as 关键字
2、上下文管理的语句块并不会开启新的作用域
3、with语句执行完的时候，会自动关闭文件对象另一种写法

f1 = open('test')
with f1:
    f1.write('abc') #文件只读，写入失败

---------------------------------------------------------------------------

UnsupportedOperation                      Traceback (most recent call last)

<ipython-input-17-3ef7ddf3fe9b> in <module>
      1 f1 = open('test')
      2 with f1:
----> 3     f1.write('abc') #文件只读，写入失败


UnsupportedOperation: not writable

f1.closed

True

对于类似于文件对象的IO对象，一般来说都需要在不使用的时候关闭，注销，以释放资源。
IO被打开的时候，会获得一个文件描述符。计算机资源是有限的，所以操作系统都会做限制。就是为了保护计算机的资源不要被完全耗尽，计算资源是共享的，不是独占的。
一般情况下，除非特别明确的知道资源情况，否则不要提高资源的限制值来解决问题。

路径操作

3.4版本开始
建议使用pathlib模块，提供Path对象来操作，包括目录和文件。
pathlib模块
 获取路径
 父路径
 通配符
 匹配
 文件操作

pathlib模块

# pathlib模块
from pathlib import Path
#目录操作
#初始化
p = Path()#不加参数为当前路径
print(p)
p = Path('a','b','c/d') #当前路径下的a/b/c/d
print(p)
p = Path('/etc') #根下的etc目录
print(p)

.
a/b/c/d
/etc

路径拼接和分解
操作符/
Path对象/Path对象
Path对象/字符串或者字符串/Path对象
分解
parts属性，可以返回路径中的每一个部分
joinpath
joinpath(*other)连接多个字符串到Path对象中

p = Path()
p = p / 'a'
print(p)
p1 = 'b' / p
print(p1)
p2 = Path('c')
p3 = p2 / p1
print(p3.parts)
p3.joinpath('etc','init.d','Path(httpd)')

a
b/a
('c', 'b', 'a')
('c', 'b', 'a')





PosixPath('c/b/a/etc/init.d/Path(httpd)')

获取路径

str获取路径字符串 bytes获取路径字符串的bytes

p = Path('/etc')
print(str(p),bytes(p))

/etc b'/etc'

父目录

parent目录的逻辑父目录 parents父目录序列，索引0是直接的父

p = Path('a/b/c/d')
print(p.parent.parent.parent)
print(p.parent.parent)
for x in p.parents:
    print(x)

a
a/b
a/b/c
a/b
a
.

name,stem,suffix,suffixes,with_suffix(suffix),with_name(name)
name 目录的最后一个部分
suffix 目录中最后一个部分的扩展名
stem 目录最后一个部分，没有后缀
suffixes 返回多个扩展名列表
with_suffix(suffix) 补充扩展名到路径的尾部，返回新的路径，扩展名存在则无效
with_name(name)替换目录最后一个部分并返回一个新的路径

p = Path('/etc/sysconfig/network-scripts/ifcfg.enp0s3')
print(p.stem)
print(p.name)
p1 = Path(str(p) + '.gz')
print(p1)
print(p1.suffix)
print(p1.suffixes)
print(p1.with_suffix('.zig'))
print(p1.with_name('nihao'))
# 另一种方法
print(p1.parent / 'test') #拼接

ifcfg
ifcfg.enp0s3
/etc/sysconfig/network-scripts/ifcfg.enp0s3.gz
.gz
['.enp0s3', '.gz']
/etc/sysconfig/network-scripts/ifcfg.enp0s3.zig
/etc/sysconfig/network-scripts/nihao
/etc/sysconfig/network-scripts/test

cwd() 返回当前系统工作路径
home() 返回当前家目录

print(p)
print(p.cwd())
print(p.home())

/etc/sysconfig/network-scripts/ifcfg.enp0s3
/root
/root

is_dir() 是否是目录
is_file() 是否是普通文件
is_symlink() 是否是软链接
is_socket() 是否是socket文件
is_block_device 是否是块设备
is_char_device() 是否是字符设备
is_absolute() 是否是绝对路径resolve() 返回一个新路径，这个新路径就是当前Path对象的绝对路径，如果是软链接则直接被解析
absolute() 也可以获取据对路径，但是推荐使用resolve()exists() 目录或文件是否存在
rmdir() 删除空目录。没有提供判断目录为空的方法
touch（mode=0o666，exist_ok=True）创建一个文件
as_uri()将路径返回成URI，例如"file:///etc/passwd"mkdir(mode=0o777，parents=False,exist_ok=False)
parents,是否创建父目录，True等同于mkdir -p； False时，父目录不存在，则抛出FileNotFoundError
exist_ok参数，在3.5版本加入。False时，路径存在，抛出FileNotFoundError；True时，FileNotFoundError被忽略
iterdir() 迭代当前目录

from pathlib import Path
p = Path()
print(p)
print(p.cwd())
p /= 'a/b/c/d'
print(p)
p.exists() #True

.
/root
a/b/c/d





False

#创建目录
# p.mkdir() #直接创建会抛出异常FileNotFoundError
# print(p) #没有父目录
p.mkdir(parents=True) #
p.exists() #True
# p.mkdir(parents=True)
# p.mkdir(parents=True,exist_ok=True)
# p /= 'readme.txt'
# p.parent.rmdir()
# p.parent.exists() #False '/a/b/c'
# p.mkdir() #FileNotFoundError

True

#遍历，并判断文件类型，如果是目录是否可以判断其是否为空
for x in p.parents[len(p.parents)-1].iterdir():
    print(x,end='/t')
    if x.is_dir():
        flag = False
        for _ in x.iterdir():
            flag = True
            break
        #for 循环是否可以使用else子句
        print('dir','Not Empty' if flag else 'Empyt',sep='/t')
    elif x.is_file():
        print('file')
    else:
        print('other file')

.bash_logout/tfile
.bash_profile/tfile
.bashrc/tfile
.cshrc/tfile
.tcshrc/tfile
anaconda-ks.cfg/tfile
.lesshst/tfile
.viminfo/tfile
.cache/tdir/tNot Empty
.bash_history/tfile
.python_history/tfile
.local/tdir/tNot Empty
未命名.ipynb/tfile
.ipynb_checkpoints/tdir/tNot Empty
.ipython/tdir/tNot Empty
未命名1.ipynb/tfile
test2/tfile
test/tfile
.jupyter/tdir/tNot Empty
a/tdir/tNot Empty

通配符

glob(pattern)通配给定的模式 rglob(pattern)通配给定的模式，递归目录返回一个生成器

p = Path('/root')
print(p)
# list(p.golb('test*')) #返回当前目录对象下的test开头的文件
list(p.glob('**/*.py')) #递归所有目录，等同rglob

/root





[PosixPath('/root/1.py'), PosixPath('/root/a/2.py')]

g = p.rglob('*.py')
next(g)

PosixPath('1.py')

匹配

match(pattern) 模式匹配，成功返回True

Path('a/b.py').match('*.py') #True
Path('a/b/c.py').match('b/*.py') #False
Path('a/b/c.py').match('a/*.py') #True
Path('a/b/c.py').match('a/*/*.py') #True
Path('a/b/c.py').match('a/**/*.py') #True
Path('a/b/c.py').match('**/*.py') #True

stat()相当与stat命令
lstat()同stat(),但如果是符号链接，则显示符号链接本身的文件信息

!ln -s test t
from pathlib import Path
p = Path('test')
p.stat()
p1 = Path('t')
p1.stat()
p1.lstst()

文件操作

open(mode='r',buffering=-1,encoding=None,errors=None,newline=None) 使用方法类似内建函数open。返回一个文件对象3.5增加的新函数 read_bytes() 以'rb'读取路径对应的文件，并返回二进制流。看源码read_text(encoding= None,errors=None) 以'rt'方式读取路径对应的文件，返回文本。 Path.write_bytes(data) 以'wb'方式写入数据到路径对应的文件 write_text(data,encoding=None,errors=None) 以'wt'方式写入字符串到路径对应的文件

p = Path('my_binary_file')
p.write_bytes(b'Binary file contents')
p.read_bytes() #b'Binary file contents'

p = Path('my_text_file')
p.write_text('Text file contents')
p.read_text() #'Text file contents'

from pathlib import Path
p = Path('o:/test.py')
p.write_text('hello python')
print(p.read_text())
with p.open() as f:
    print(f.read(5))