python中的正则表达式（re模块）

特殊字符

"."　　： Matches any character except a newline.  匹配任何字符 除了换行符
"^"　　:匹配字符的开头

1 import re
2 s=re.findall('^c234','ac2324')
3 s1=re.findall('^ac','ac2324')
4 print(s)
5 print(s1)
6 #输出：[]
7 #    ['ac']

"$"　　：匹配字符串的结尾

1 import re
2 s=re.findall('c234$','ac2324')
3 s1=re.findall('ac2324$','ac2324')
4 print(s)
5 print(s1)
6 #输出：[]
7 #     ['ac2324']

" * "　　：匹配前一个字符0或多次

import re
s=re.findall('abc*','ab')
s1=re.findall('abc*','abcc')
print(s)
print(s1) 
#输出：['ab']   ['abcc']

"+"　　：匹配前一个字符1次或无限次

import re
s=re.findall('abc+','ab')
s1=re.findall('abc+','abc')
print(s)
print(s1)
#输出：[]   ['abcc']

“？”　　：匹配一个字符0次或1次

import re
s=re.findall('ab222c?','ab222')
s1=re.findall('ab222c?','ab222cccccc')
print(s)
print(s1)
#输出：['ab222']       ['ab222c']

python中re模块提供了正则表达式相关操作

字符：

　　. 匹配除换行符以外的任意字符
　　w 匹配字母或数字或下划线或汉字
　　s 匹配任意的空白符
　　d 匹配数字
　　匹配单词的开始或结束
　　^ 匹配字符串的开始
　　$ 匹配字符串的结束

次数：

　　* 重复零次或更多次
　　+ 重复一次或更多次
　　? 重复零次或一次
　　{n} 重复n次
　　{n,} 重复n次或更多次
　　{n,m} 重复n到m次

match

# match，从起始位置开始匹配，匹配成功返回一个对象，未匹配成功返回None
 
 
 match(pattern, string, flags=0)
 # pattern： 正则模型
 # string ： 要匹配的字符串
 # falgs  ： 匹配模式
     X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
     I  IGNORECASE  Perform case-insensitive matching.
     M  MULTILINE   "^" matches the beginning of lines (after a newline)
                    as well as the string.
                    "$" matches the end of lines (before a newline) as well
                    as the end of the string.
     S  DOTALL      "." matches any character at all, including the newline.
 
     A  ASCII       For string patterns, make w, W, , B, d, D
                    match the corresponding ASCII character categories
                    (rather than the whole Unicode categories, which is the
                    default).
                    For bytes patterns, this flag is the only available
                    behaviour and needn't be specified.
      
     L  LOCALE      Make w, W, , B, dependent on the current locale.
     U  UNICODE     For compatibility only. Ignored for string patterns (it
                    is the default), and forbidden for bytes patterns.

 1 复制代码
 2 
 3         # 无分组
 4         r = re.match("hw+", origin)
 5         print(r.group())     # 获取匹配到的所有结果
 6         print(r.groups())    # 获取模型中匹配到的分组结果
 7         print(r.groupdict()) # 获取模型中匹配到的分组结果
 8 
 9         # 有分组
10 
11         # 为何要有分组？提取匹配成功的指定内容（先匹配成功全部正则，再匹配成功的局部内容提取出来）
12 
13         r = re.match("h(w+).*(?P<name>d)$", origin)
14         print(r.group())     # 获取匹配到的所有结果
15         print(r.groups())    # 获取模型中匹配到的分组结果
16         print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组
17 
18 复制代码

demo

search,浏览整个字符串去匹配第一个，未匹配成功返回None
search(pattern, string, flags=0)

# 无分组

        r = re.search("aw+", origin)
        print(r.group())     # 获取匹配到的所有结果
        print(r.groups())    # 获取模型中匹配到的分组结果
        print(r.groupdict()) # 获取模型中匹配到的分组结果

        # 有分组

        r = re.search("a(w+).*(?P<name>d)$", origin)
        print(r.group())     # 获取匹配到的所有结果
        print(r.groups())    # 获取模型中匹配到的分组结果
        print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组

findall

findall，获取非重复的匹配列表；如果有一个组则以列表形式返回，且每一个匹配均是字符串；如果模型中有多个组，则以列表形式返回，且每一个匹配均是元祖；
空的匹配也会包含在结果中
findall(pattern, string, flags=0)

        # 无分组
        r = re.findall("aw+",origin)
        print(r)

        # 有分组
        origin = "hello alex bcd abcd lge acd 19"
        r = re.findall("a((w*)c)(d)", origin)
        print(r)

split

split，根据正则匹配分割字符串

split(pattern, string, maxsplit=0, flags=0)
 pattern： 正则模型
string ： 要匹配的字符串
maxsplit：指定分割个数
flags  ： 匹配模式

# 无分组
        origin = "hello alex bcd alex lge alex acd 19"
        r = re.split("alex", origin, 1)
        print(r)

        # 有分组
        
        origin = "hello alex bcd alex lge alex acd 19"
        r1 = re.split("(alex)", origin, 1)
        print(r1)
        r2 = re.split("(al(ex))", origin, 1)
        print(r2)
sub
sub，替换匹配成功的指定位置字符串
sub(pattern, repl, string, count=0, flags=0)
pattern： 正则模型
repl   ： 要替换的字符串或可执行对象
string ： 要匹配的字符串
count  ： 指定匹配个数
flags  ： 匹配模式

 # 与分组无关

        origin = "hello alex bcd alex lge alex acd 19"
        r = re.sub("aw+", "999", origin, 2)
        print(r)

1 IP：
2 ^(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}$
3 手机号：
4 ^1[3|4|5|8][0-9]d{8}$
5 邮箱：
6 [a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(.[a-zA-Z0-9_-]+)+
常用正则表达式