split与re.split/捕获分组和非捕获分组/startswith和endswith和fnmatch/finditer 笔记

split()对字符串进行划分：

>>> a = 'a b c d'
>>> a.split(' ')
['a', 'b', 'c', 'd']

复杂一些可以使用re.split()

>>> import re
>>> re.split(r'[;,.]s', a)
['a', 'b', 'c', 'd']

捕获分组和非捕获分组

>>> a
'a; b, c. d f'
>>> re.split(r'(;|,|.|s)s*', a) # 捕获分组(会讲括号符合条件的字符匹配出来)
['a', ';', 'b', ',', 'c', '.', 'd', ' ', 'f']
>>> re.split(r'(?:;|,|.|s)s*', a) # 非捕获分组(不会讲括号符合条件的字符匹配出来)
['a', 'b', 'c', 'd', 'f']

startswith、endswith和fnmatch

startswith()用来判断是否是以什么字符开头
>>> a = 'index.py'
>>> a.startswith('in')
True

endswith()判断字符是以什么结尾
>>> a = 'index.py'
>>> a.endswith('py')
True

fnmatch()用来匹配字符串
>>> from fnmatch import fnmatch
>>> fnmatch('index.py', '*.py')
True
值得注意的是：fnmatch()在window和linux操作系统上有区别
# 在window操作系统上是成功的
>>> fnmatch('index.py', '*.PY')
True
# 在Linux操作系统上使用失败
>>> from fnmatch import fnmatch
>>> fnmatch('index.py', '*.py')
True
>>> fnmatch('index.py', '*.PY')
False

如果想忽略该区别可以是fnmatchcase()，fnmatchcase()严格区分大小写

>>> from fnmatch import fnmatchcase
>>> fnmatchcase('index.py', '*.py')
True
>>> fnmatchcase('index.py', '*.PY')
False

finditer()将找到的全部的参数以迭代器的形式返回

>>> import re
>>> a = 'ahd; ncc,slf sa. e'
>>> patt1 = re.compile(r'[a-z]+')
>>> for i in patt1.finditer(a):
... print(i)
...
<re.Match object; span=(0, 3), match='ahd'>
<re.Match object; span=(5, 8), match='ncc'>
<re.Match object; span=(9, 12), match='slf'>
<re.Match object; span=(13, 15), match='sa'>
<re.Match object; span=(17, 18), match='e'>
>>> print(type(patt1.finditer(a)))
<class 'callable_iterator'>

当然:如果只是使用与文件匹配有个更好的选择就是glob模块