python模块之re正则表达式

41、python的正则表达式

1、

python中re模块提供了正则表达式相关操作

字符：

　　. 匹配除换行符以外的任意字符

　　w 匹配字母或数字或下划线或汉字

W大写代表非w.

　　s 匹配任意的空白符

　　d 匹配数字

　　匹配单词的开始或结束这里的单词是指连续的字母，数字和下划线组成的字符串。

>>> re.findall(r'I','I am a bIoy')

['I']

　　　>>> re.findall(r'I','I am a bIoyI')

　　　　['I', 'I']

　　^ 匹配字符串的开始
　　$ 匹配字符串的结束

次数：

　　* 重复零次或更多次
　　+ 重复一次或更多次
　　? 重复零次或一次
　　{n} 重复n次
　　{n,} 重复n次或更多次

　　{n,m} 重复n到m次

>>> re.findall('ale{1,5}x',s1)

['alex']

>>> s1 = 'sdfsoodsfjsdaleeeeex'

>>> re.findall('ale{1,5}x',s1)

['aleeeeex']

[]代表匹配任意一个：[]中的元字符没有特殊意义，除了

- ，^在中括号里面代表非的意思！！！！！，

：反斜杠后面跟元字符表示去除其特殊功能（转义）

: 反斜杠后面跟一些普通字符实现特殊功能。

>>> re.findall('a[bc]d','abde')

　　['abd']

　　>>> re.findall('a[bc]d','acde')

　　['acd']

　　>>> re.findall('a[a-z]d','acde')

　　['acd']

　　>>> re.findall('a[a-z]d','acesde')

　　[]

　　>>> re.findall('a[a-z]+d','acesde')

　　['acesd']

match

# match，从起始位置开始匹配，匹配成功返回一个对象，未匹配成功返回None

match(pattern, string, flags=0)

# pattern：正则模型

# string ：要匹配的字符串

# falgs ：匹配模式

X VERBOSE Ignore whitespace and comments for nicer looking RE's.

I IGNORECASE Perform case-insensitive matching.

M MULTILINE "^" matches the beginning of lines (after a newline)

as well as the string.

"$" matches the end of lines (before a newline) as well

as the end of the string.

S DOTALL "." matches any character at all, including the newline.

A ASCII For string patterns, make w, W, , B, d, D

match the corresponding ASCII character categories

(rather than the whole Unicode categories, which is the

default).

For bytes patterns, this flag is the only available

behaviour and needn't be specified.

L LOCALE Make w, W, , B, dependent on the current locale.

U UNICODE For compatibility only. Ignored for string patterns (it

is the default), and forbidden for bytes patterns.

# 无分组

r = re.match("hw+", origin)

print(r.group()) # 获取匹配到的所有结果

print(r.groups()) # 获取模型中匹配到的分组结果

print(r.groupdict()) # 获取模型中匹配到的分组结果

# 有分组

# 为何要有分组？提取匹配成功的指定内容（先匹配成功全部正则，再匹配成功的局部内容提取出来）

r = re.match("h(w+).*(?P<name>d)$", origin)

print(r.group()) # 获取匹配到的所有结果

print(r.groups()) # 获取模型中匹配到的分组结果

print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组

Demo

search

1 2	`# search,浏览整个字符串去匹配第一个，未匹配成功返回None` `# search(pattern, string, flags=0)`

# 无分组

r = re.search("aw+", origin)

print(r.group()) # 获取匹配到的所有结果

print(r.groups()) # 获取模型中匹配到的分组结果

print(r.groupdict()) # 获取模型中匹配到的分组结果

# 有分组

r = re.search("a(w+).*(?P<name>d)$", origin)

print(r.group()) # 获取匹配到的所有结果

print(r.groups()) # 获取模型中匹配到的分组结果

print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组

demo

findall

# findall，获取非重复的匹配列表；如果有一个组则以列表形式返回，且每一个匹配均是字符串；如果模型中有多个组，则以列表形式返回，且每一个匹配均是元祖；

# 空的匹配也会包含在结果中

#findall(pattern, string, flags=0)

# 无分组

r = re.findall("aw+",origin)

print(r)

# 有分组

origin = "hello alex bcd abcd lge acd 19"

r = re.findall("a((w*)c)(d)", origin)

print(r)

Demo

sub

# sub，替换匹配成功的指定位置字符串

sub(pattern, repl, string, count=0, flags=0)

# pattern：正则模型

# repl ：要替换的字符串或可执行对象

# string ：要匹配的字符串

# count ：指定匹配个数

# flags ：匹配模式

# 与分组无关
origin = "hello alex bcd alex lge alex acd 19"
r = re.sub("aw+", "999", origin, 2)

print(r)

hello 999 bcd 999 lge alex acd 19

split

# split，根据正则匹配分割字符串

split(pattern, string, maxsplit=0, flags=0)

# pattern：正则模型

# string ：要匹配的字符串

# maxsplit：指定分割个数

# flags ：匹配模式

# 无分组
origin = "hello alex bcd alex lge alex acd 19"
r = re.split("alex", origin, 1)

print(r)

['hello ', ' bcd alex lge alex acd 19']

# 有分组
origin = "hello alex bcd alex lge alex acd 19"

r1 = re.split("(alex)", origin, 1)

print(r1)

['hello ', 'alex', ' bcd alex lge alex acd 19']

r2 = re.split("(al(ex))", origin, 1)

print(r2)

['hello ', 'alex', 'ex', ' bcd alex lge alex acd 19']

IP：
^(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}$
手机号：
^1[3|4|5|8][0-9]d{8}$
邮箱：
[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(.[a-zA-Z0-9_-]+)+

#re.compile（）封装成一个对象用来调用!!

>>> origin = "hello alex bcd alex lge alex acd 19"

>>> regex = re.compile(r'alex')

>>> regex.findall(origin)

['alex', 'alex', 'alex']