Python基础

主要是介绍两个文件处理的内置模块 os, pathlib. 上篇对文件的读写基本搞定了. 当然, 因为我做数据的嘛, 我的日常并不是简单的读写下文件, 而是重在读取数据后, 各种复杂的操作. 用到的更多是 pandas 这样的库来做数据分析. 也因为强大的 pandas, 其实比 excel 更加灵活和强大, 也是我以前做 办公自动化的首先工具.

还是回到正题, 文件处理. 通常就是获取一个文件夹下的所有文件, 或者提取特定文件, 读取数据, 创建文件夹, 文件这些繁琐的事情, 其实也不是很难, 逻辑清楚了, 边用边百度就能搞定.

OS

也只是一常用到的函数.

常用的-增删改查

import os

# 获取当前路径
os.getcwd()

 'E:\Jupyter notes\Python_data_struct'

# 获取某路径下的所有文件
os.listdir()

['.git', '.gitignore', '.idea', 'pythonds', 'test.py', 'test2.py', 'test副本py']

# 删除文件
os.remove('test副本py')

os.listdir()

['.git', '.gitignore', '.idea', 'pythonds', 'test.py', 'test2.py']
    
# 判断某文件是否存在
if os.path.exists("hello.py"):
    print("yes")
    
else:
    print('no')
    
no

# 创建文件夹
os.mkdir('my_dir')

os.listdir()
['.git', '.gitignore', '.idea', 'my_dir', 'pythonds', 'test.py', 'test2.py']

# 改变目录
os.chdir(path)

栗子 - 判断文件

import os


def file_check():
    # 获取当目录下的所有文件
    file_list = os.listdir(os.getcwd())

    if not file_list:
        return

    for file in file_list:
        if os.path.isdir(file):
            print(file, 'is a dir')

        elif os.path.isfile(file):
            print(file, 'is a file')

        elif os.path.islink(file):
            print(file, 'is a link')
        else:
            print(file, 'en, en , en')


if __name__ == '__main__':
    file_check()

.git is a dir
.gitignore is a file
.idea is a dir
my_dir is a dir
pythonds is a dir
test.py is a file
test2.py is a file

最直接的用, 用过 linux 就知道, 最为常见的命令, 不就是 ls 嘛. 不就是这里的 os.listdir() 呀. 一样的功能. 然后判断时是否是文件, 还是文件夹, 若为文件夹, 则继续 ls ... 想想, 这不就是一个递归的过程嘛, 什么批量改名文件, 什么多层目录文件递归拷贝, 都是一样的逻辑, 蛮简单的, 不想写了, 有个印象就行, 还是重点在于库的大概知道就好.

glob

这个内置模块, 就专门用来, 查找特定文件的, 简直无敌强. 文档是这样说的, 非常详细.

比如上栗, 就不用遍历整个目录判断每个文件是不是符合。

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched. This is done by using the os.scandir() and fnmatch.fnmatch() functions in concert, and not by actually invoking a subshell. Note that unlike fnmatch.fnmatch(), glob treats filenames beginning with a dot (.) as special cases. (For tilde and shell variable expansion, use os.path.expanduser() and os.path.expandvars().)

我看了下它的源代码, 其实就是用 re 和 os 的部分功能封装而已, 也不是说很厉害的样子.

”*” 匹配0个或多个字符；
”?” 匹配单个字符；
”[ ]” 匹配指定范围内的字符，如：[0-9]匹配数字。

import glob

# 匹配当前所有的文件
glob.glob("*")

['123.txt', 'my_dir', 'pythonds', 'test.py', 'test2.py']

# my_dir 文件夹下的所有文件, 不存在则 [], 不报错
glob.glob('my_dir/*')

['my_dir\a.txt', 'my_dir\b.txt', 'my_dir\c_dir']

# 匹配出当前所有后缀为 .txt 的文件
glob.glob('*.txt')

['123.txt', '测试.txt']

# 特定文件名的
glob.glob('12*.txt')

['123.txt']


# 以迭代器的方式, 不用一下全查出来, 节省内存
g = glob.iglob('*.txt')

g
<generator object _iglob at 0x000001E652CFA360>

list(g)

['123.txt', '测试.txt']

基本常用的文件操作也就这些.

pathlib

从3.4以后, 说是 面向对象的文件系统路径. 用得多一点就是里面的 Path 类多一些.

对于底层的路径字符串操作，你也可以使用 os.path 模块。

也不算是对 os.path 的完善啥的, 就使用上更加 pythonic 一点, 网上找了一些栗子参考一波就行了, 用到可在看看文档即可.

令人困惑的 os.path.join()

每次字符串拼接,我都要先测试一波, 就从未记住过. 跟 json 的 dumps(), loads() 一样的, 从未记住, 只能测试.

# 字符串 -> 列表 => split()
lst = "I am little prince".split()
print(lst)

# 列表 -> 字符 => join()
my_str = ', '.join(lst)
print(my_str)

['I', 'am', 'little', 'prince']
I, am, little, prince

import os.path

# 创建文件夹 src, 里面再建一个 demo 文件夹
os.makedirs(os.path.join('src', 'demo'), exist_ok=True)

# 123.txt 不存在则报错. 并其路径改为 src/123.txt (文件改名和移动)
os.rename('123.txt', os.path.join('src', '1234.txt'))

join() 是真的容易被搞混淆的一个函数哦. 同样的功能, 用 pathlib 则就非常优雅了.

from pathlib import Path

Path('src/demo').mkdir(parents=True, exist_ok=True)

# 重命名文件 并 移动
Path('123.txt').rename('src/1234.txt')

这种面向对象的风格, 真实太爽了.

# Path(Purpath) 默认的是纯路径, 但我就是要相对路径, 这暂时不知咋弄.

g = Path.cwd().glob('*.txt')

list(g)
[WindowsPath('E:/Jupyter notes/Python_data_struct/测试.txt')]

... 暂时不举例了.. 有个印象就行了.

对于文件处理, 不就是这写读写文件, 文件判断这些嘛, 假设中间的数据处理, 这其实就是文件批处理和自动化呀. 我感觉, 后面我单独搞一波分支,专门来搞办公自动化, 应该可以..文件操作就暂时这样吧.