python 正则表达式

最近需要从日志中解析一些业务数据，因为日志中的数据大部分是不规范，所以我这边首先考虑通过正则表达式的方式来解析日志中的数据。

Python 自1.5版本起增加了re 模块，而笔者目前使用的python 3.5.1，下面简单的介绍一下 re.compile,re.match,re.search,re.findall方法，暂时还有一些例如re.sub,re.split等比较常用的方法暂时就不在这里介绍，有兴趣的可以大家一起来探讨学习。

re.compile语法：

　　compile(pattern, flags=0)
　　　　Compile a regular expression pattern, returning a pattern object.

　re.compile 函数根据一个正则表达式的字符串生成一个正则表达式对象。该对象拥有一系列方法用于正则表达式匹配和替换。

参数	描述
pattern	正则表达式
flags	标记位，用于控制正则表达式的匹配方式

举个例子：　

　PATTERN_MATCH = re.compile(r'd')  #d 匹配数字

　　re.match语法

    match(pattern, string, flags=0)
        Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.

　　re.match 尝试从字符串的起始位置匹配，如果不是起始位置匹配成功的话，match()就返回None

参数	描述
pattern	正则表达式
string	匹配的字符串
flags	标记位，用于控制正则表达式的匹配方式

string = '[order]-2017-03-28 00:41:10,200 [SimpleAsyncTaskExecutor-26] INFO  c.c.t.t.o.t.t.OrderSubmitTransition - 订单[20170328003706846400696981925888]出票结束，返回结果[true]'
#首先匹配中间字符，不匹配开头字符串
match = re.match(r'd{32}',string)
if match:
    print('匹配订单号:',match.group())
#匹配开头字符串
match =  re.match(r'[w+]',string)
if match:
    print('匹配APP名称:',match.group())

　　re.search语法

search(pattern, string, flags=0)
　　Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.

　　re.search 扫描整个字符串并返回第一个成功的匹配，如果有多个符合，那也只能匹配到第一个。

　　参数类型与match一致，这里详细说明

string = '[order]-2017-03-28 00:41:10,200 [SimpleAsyncTaskExecutor-26] INFO c.c.t.t.o.t.t.OrderSubmitTransition - 订单[20170328003706846400696981925888]出票结束，返回结果[true]'
search = re.search(r'd{32}',string)
if match:
    print('匹配订单号:',search.group())
#匹配[]中的字符串
search =  re.search(r'[w+]',string) #[order]，[20170328003706846400696981925888]和[true] 均满足，但是匹配到[order]时已经结束；
if search: 
　　print('匹配[]中的数据:',search.group())

　　re.findall 语法

    findall(pattern, string, flags=0)
        Return a list of all non-overlapping matches in the string.
        
        If one or more capturing groups are present in the pattern, return
        a list of groups; this will be a list of tuples if the pattern
        has more than one group.
        
　　　　　Empty matches are included in the result.

　　re.findall 扫描整个字符串以列表形式返回全部能匹配的子串，这里findall 就会弥补了re.search的缺点。

string = '[order]-2017-03-28 00:41:10,200 [SimpleAsyncTaskExecutor-26] INFO  c.c.t.t.o.t.t.OrderSubmitTransition - 订单[20170328003706846400696981925888]出票结束，返回结果[true]'
findall =  re.findall(r'[w+]',string)
if search:
    print('匹配[]中的数据:',findall)

到这里我就将re常用的几个匹配模式讲解完成，这里仅代表自己的见解和理解，同时也参考多个文章，对这几个方法的使用进行加深，如果有问题请大家指正，共同进步。