pythonday05正则表达式

相关参考文档地址:http://bbs.fishc.com/thread-57073-1-1.html（小甲鱼论坛）

摘录老师之精华

re模块用于对python的正则表达式的操作。

字符：

　　. 匹配除换行符以外的任意字符
　　\w 匹配字母或数字或下划线或汉字
　　\s 匹配任意的空白符
　　\d 匹配数字
　　\b 匹配单词的开始或结束
　　^ 匹配字符串的开始
　　$ 匹配字符串的结束

次数：

　　* 重复零次或更多次
　　+ 重复一次或更多次
　　? 重复零次或一次
　　{n} 重复n次
　　{n,} 重复n次或更多次
　　{n,m} 重复n到m次

IP：
^(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}$
手机号：
^1[3|4|5|8][0-9]\d{8}$

1、match(pattern, string, flags=0)

从起始位置开始根据模型去字符串中匹配指定内容，匹配单个

正则表达式
要匹配的字符串
标志位，用于控制正则表达式的匹配方式

1 import re
2 
3 obj = re.match('\d+', '123uuasf')
4 if obj:
5     print obj.group()

View Code

flags

1 # flags
2 I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
3 L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
4 U = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode locale
5 M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
6 S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
7 X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments

2、search(pattern, string, flags=0)

根据模型去字符串中匹配指定内容，匹配单个

1 import re
2 
3 obj = re.search('\d+', 'u123uu888asf')
4 if obj:
5     print obj.group()

View Code

3、group和groups

1 a = "123abc456"
2 print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group()
3 
4 print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(0)
5 print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1)
6 print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(2)
7 
8 print re.search("([0-9]*)([a-z]*)([0-9]*)", a).groups()

View Code

4、findall(pattern, string, flags=0)

上述两中方式均用于匹配单值，即：只能匹配字符串中的一个，如果想要匹配到字符串中所有符合条件的元素，则需要使用 findall。

1 import re
2 
3 obj = re.findall('\d+', 'fa123uu888asf')
4 print obj

View Code

5、sub(pattern, repl, string, count=0, flags=0)

用于替换匹配的字符串

content = "123abc456"
new_content = re.sub('\d+', 'sb', content)
# new_content = re.sub('\d+', 'sb', content, 1)
print new_content

相比于str.replace功能更加强大

6、split(pattern, string, maxsplit=0, flags=0)

根据指定匹配进行分组

示例

 1 content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
 2 new_content = re.split('\*', content)
 3 # new_content = re.split('\*', content, 1)
 4 print new_content
 5 content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
 6 new_content = re.split('[\+\-\*\/]+', content)
 7 # new_content = re.split('\*', content, 1)
 8 print new_content
 9 inpp = '1-2*((60-30 +(-40-5)*(9-2*5/3 + 7 /3*99/4*2998 +10 * 568/14 )) - (-4*3)/ (16-3*2))'
10 inpp = re.sub('\s*','',inpp)
11 new_content = re.split('\(([\+\-\*\/]?\d+[\+\-\*\/]?\d+){1}\)', inpp, 1)
12 print new_content

View Code

相比于str.split更加强大