【Python爬虫】Re(正则表达式)库入门

正则表达式的概念

 

 

 

 

 

 正则表达式的语法

 

 

 

 

 

 

 

  Re库的基本使用

 

 

 

 

 

 

 练习:

>>> import re
>>> match=re.search(r'[1-9]d{5}','BIT 100081')
>>> if match:
    print(match.group(0))

    
100081
>>> 

 练习:

>>> import re
>>> match=re.match(r'[1-9]d{5}','BIT 100081')
>>> if match:
    match.group(0)

    
>>> match.group(0)
Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    match.group(0)
AttributeError: 'NoneType' object has no attribute 'group'
>>> match=re.match(r'[1-9]d{5}','100081 BIT')
>>> if match:
    match.group(0)

    
'100081'
>>> 

 练习:

>>> import re
>>> ls=re.findall(r'[1-9]d{5}','BIT 100081 TSU100084')
>>> ls
['100081', '100084']
>>> 

 练习:

>>> import re
>>> re.split(r'[1-9]d{5}','BIT100081 TSU100084')
['BIT', ' TSU', '']
>>> re.split(r'[1-9]d{5}','BIT100081 TSU100084',maxsplit=1)
['BIT', ' TSU100084']
>>> 

 练习:

>>> import re
>>> for m in re.finditer(r'[1-9]d{5}','BIT100081 TSU100084'):
    if m:
        print(m.group(0))

        
100081
100084

 练习:

>>> import re
>>> re.sub(r'[1-9]d{5}',':zipcode','BIT100081 TSU100084')
'BIT:zipcode TSU:zipcode'
>>> 

 

 

 Re库的match对象

>>> import re
>>> match=re.search(r'[1-9]d{5}','BIT 100081')
>>> if match:
    print(match.group(0))

    
100081
>>> type(match)
<class 're.Match'>
>>> 

 

 练习:

>>> import re
>>> m=re.search(r'[1-9]d{5}','BIT 100081 TSU100084')
>>> m.string
'BIT 100081 TSU100084'
>>> m.re
re.compile('[1-9]\d{5}')
>>> m.pos
0
>>> m.endpos
20
>>> m.group(0)
'100081'
>>> m.start()
4
>>> m.end()
10
>>> m.span()
(4, 10)
>>> 

Re库的贪婪匹配和最小匹配

 

 

 

原文地址:https://www.cnblogs.com/HGNET/p/13272842.html