(五)5-3Python正则表达式

正则对象的方法

1、match 方法

import  re
reg = re.compile(r'(hello w.*)(hello cn.*)')
a = 'hello world hello cnblogs'
result = reg.match(a)
print(result)

b = "aa" + a
print(b)
result2 = reg.match(b)
print(result2)

运行结果：

<_sre.SRE_Match object at 0x00000000025E21C8>
hello world hello cnblogs
aahello world hello cnblogs
None

2、search方法

import  re
reg = re.compile(r'(hello w.*)(hello cn.*)')
a = 'hello world hello cnblogs'
b = "aa" + a
print(b)
result3 = reg.search(b)
print(result3)
print(result3.group())

运行结果：

aahello world hello cnblogs
<_sre.SRE_Match object at 0x0000000002592140>
hello world hello cnblogs

3、正则对象的split方法

import re
p = re.compile(r'd+')
print(p.split('one1two2three3four4'))

运行结果：

['one', 'two', 'three', 'four', '']

注:直接把p的正则当成分隔符，然后把最后的字符串用p进行分割，最终返回结果

4、正则对象的findall方法
findall(string[,pos[,endpos]])
搜索string，以列表形式返回全部能匹配的字符串

import re
p = re.compile(r'd+')
print(p.findall('one1two2three3four4'))

运行结果

['1', '2', '3', '4']

注：findall把匹配的字符串最后一列表的形式返回

5、正则对象的finditer方法
finditer(string[,pos[,endpos]])
搜索string，返回一个顺序访问每一个结果(match对象)的迭代器

import re
p = re.compile(r'd+')
a_string = 'one1two2three3four4'
for i in p.finditer(a_string):
    # print(i)
    print(type(i))
    print(i.group())

运行结果:

<type '_sre.SRE_Match'>
1
<type '_sre.SRE_Match'>
2
<type '_sre.SRE_Match'>
3
<type '_sre.SRE_Match'>
4

注：p.finditer(a_string)是一个迭代器，返回的每个m都是match对象

6、正则对象的sub方法

sub(pattern, repl, string, count=0)
pattern : 正则中的模式字符串。
repl : 替换的字符串，也可为一个函数。
string : 要被查找替换的原始字符串。
count : 模式匹配后替换的最大次数，默认 0 表示替换所有的匹配。

match匹配对象

import  re
reg = re.compile(r'(?P<tagname>abc)(.*)(?P=tagname)')
result = reg.match('abclfjasdasda234hhkhabc')

print(dir(result))
print(result)
print(result.groups())
print(result.group(2))
print(result.group('tagname'))
print('*'*10)
print(result.groupdict())

运行结果：

['__class__', '__copy__', '__deepcopy__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'end', 'endpos', 'expand', 'group', 'groupdict', 'groups', 'lastgroup', 'lastindex', 'pos', 're', 'regs', 'span', 'start', 'string']
<_sre.SRE_Match object at 0x0000000002662140>
('abc', 'lfjasdasda234hhkh')
lfjasdasda234hhkh
abc
**********
{'tagname': 'abc'}

注:
1、 result 由字符串转换成了正则对象
2、 result.groups()是所有的匹配到的数据，每个()是一个元素，最终返回一个tuple
3、 group()可以通过下标（从1开始）的方式访问，也可以通过分组名进行访问
4、 groupdict只能显示有分组名的数据