python re 模块小结

前言：

本人环境windows 7 64位，python2.7

re是什么：

regular expression缩写，意为正则表达式，是python的众多模块之一

re用途：

从文本中有选择的批量抽取想要的文本碎片

re类型：

分为DFA(确定的有穷状态自动机)和NFA(非确定的有穷状态自动机)

re的安装：

打开DOS；CD到pip目录下；输入命令pip install re

re常用方法：

1.re.compile(pattern, flags=0)

　　pattern是str类型的，例：pattern = r‘^.*?$’

2.re.findall(pattern, string, flags=0)

　　Return a list of all non-overlapping matches in the string.返回字符串中所有非重叠匹配的列表。

　　例1：print re.findall(r’(s)(d)’, ‘gsd sd fsa ggh sd hf sdgf ’)

　　结果：[('s', 'd'), ('s', 'd'), ('s', 'd'), ('s', 'd')]

　　例2：print re.findall(r'(s)d','gsd sd fsa ggh sd hf sdgf')

　　结果：['s', 's', 's', 's']

　　例3：print re.findall(r'sd','gsd sd fsa ggh sd hf sdgf')

　　结果：['sd', 'sd', 'sd', 'sd']

　　用途：抽取网页源代码中的链接等

3.re.split(pattern, string, maxsplit=0, flags=0)

Split the source string by the occurrences of the pattern,

returning a list containing the resulting substrings.返回list

　　例：print re.split(r's','jsjkjoishioshuisguusnjshbsg')

　　结果：['j', 'jkjoi', 'hio', 'hui', 'guu', 'nj', 'hb', 'g']

　　用途：将大段文本分成易于处理的小片段

4.re.match(pattern, string, flags=0)

Try to apply the pattern at the start of the string, returning

a match object, or None if no match was found.返回的object类似指针

5.re.search(pattern, string, flags=0)

Scan through string looking for a match to the pattern, returning

a match object, or None if no match was found.返回的object类似指针

参考资料：IDLE的help命令；

后记：如有错漏，欢迎指正，有时间就更新

本文是博主原创，转载请注明出处，并@我心飞翔2015，谢谢合作！