常用内置模块（三）--subprocess、re

一、subprocess模块

进程：一个正在运行的程序

子进程：在父进程运行的过程中在其内部又开启了一个进程，即子进程。

作用:用于执行系统命令

os.system也可以获取当前的进程信息，但是它只能打印到屏幕，而无法进行其他操作，有局限性。

 1 import  subprocess
 2 
 3 '''
 4 sh-3.2# ls /Users/egon/Desktop |grep txt$
 5 mysql.txt
 6 tt.txt
 7 事物.txt
 8 '''
 9 
10 res1=subprocess.Popen('ls /Users/jieli/Desktop',shell=True,stdout=subprocess.PIPE)
11 res=subprocess.Popen('grep txt$',shell=True,stdin=res1.stdout,
12                  stdout=subprocess.PIPE)
13 
14 print(res.stdout.read().decode('utf-8'))
15 
16 
17 #等同于上面,但是上面的优势在于,一个数据流可以和另外一个数据流交互,可以通过爬虫得到结果然后交给grep
18 res1=subprocess.Popen('ls /Users/jieli/Desktop |grep txt$',shell=True,stdout=subprocess.PIPE)
19 print(res1.stdout.read().decode('utf-8'))
20 
21 
22 #windows下:
23 # dir | findstr 'test*'
24 # dir | findstr 'txt$'
25 import subprocess
26 res1=subprocess.Popen(r'dir C:UsersAdministratorPycharmProjects	est函数备课',shell=True,stdout=subprocess.PIPE)
27 res=subprocess.Popen('findstr test*',shell=True,stdin=res1.stdout,
28                  stdout=subprocess.PIPE)
29 
30 print(res.stdout.read().decode('gbk')) #subprocess使用当前系统默认编码，得到结果为bytes类型，在windows下需要用gbk解码

二、re模块

1、什么是re

re是正则表达式，正则表达式是一些带有特殊意义的符号或符号的组合

2、常用匹配模式

最常用的有：
　　单个字符匹配：
　　　  w    字母数字下划线 
 　    s     所有不可见字符（
 	 f）
　　   d     所有数字
　　　　.     除了
以外的所有字符
　　　　^     字符串的开头，写在表达式的前面
　　　　$　 　 字符串的末尾，写在表达式的后面

　　范围匹配：
　　　　[abc]   括号内的一个字符     
       a|b     a或b

　　重复匹配
　　　　{}   {，m}：0到m之间，   {m,n}:m到n之前  ， {m}:必须是m
　　　  +    匹配1个或多个，会一直匹配到不满足条件为止，用“?”问号来阻止贪婪匹配(匹配最少满足条件的字符数)
       *    匹配0个或多个，会一直匹配到不满足条件为止，用“?”问号来阻止贪婪匹配(匹配最少满足条件的字符数)
       ?    匹配1个或0个

　　分组
　　　　()　　匹配括号内的表达式,提取括号中的表达式，不会改变原来的表达式逻辑意义

　　取消分组
　　　　(?: )

 1 import  subprocess
 2 
 3 '''
 4 sh-3.2# ls /Users/egon/Desktop |grep txt$
 5 mysql.txt
 6 tt.txt
 7 事物.txt
 8 '''
 9 
10 res1=subprocess.Popen('ls /Users/jieli/Desktop',shell=True,stdout=subprocess.PIPE)
11 res=subprocess.Popen('grep txt$',shell=True,stdin=res1.stdout,
12                  stdout=subprocess.PIPE)
13 
14 print(res.stdout.read().decode('utf-8'))
15 
16 
17 #等同于上面,但是上面的优势在于,一个数据流可以和另外一个数据流交互,可以通过爬虫得到结果然后交给grep
18 res1=subprocess.Popen('ls /Users/jieli/Desktop |grep txt$',shell=True,stdout=subprocess.PIPE)
19 print(res1.stdout.read().decode('utf-8'))
20 
21 
22 #windows下:
23 # dir | findstr 'test*'
24 # dir | findstr 'txt$'
25 import subprocess
26 res1=subprocess.Popen(r'dir C:UsersAdministratorPycharmProjects	est函数备课',shell=True,stdout=subprocess.PIPE)
27 res=subprocess.Popen('findstr test*',shell=True,stdin=res1.stdout,
28                  stdout=subprocess.PIPE)
29 
30 print(res.stdout.read().decode('gbk')) #subprocess使用当前系统默认编码，得到结果为bytes类型，在windows下需要用gbk解码

1 # 贪婪匹配  *  +    不是固定的特殊符号  只是一种现象
 2 # 会一直匹配到不满足条件为止 用问号来阻止贪婪匹配(匹配最少满足条件的字符数)
 3 
 4 print(re.findall("w+?", "ajshsjkdsd"))
 5 # ['a', 'j', 's', 'h', 's', 'j', 'k', 'd', 's', 'd']
 6 
 7 print(re.findall("w*?", "ajshsjkdsd"))
 8 # ['', '', '', '', '', '', '', '', '', '', '']
 9 
10 print(re.findall("w+?s", "ajshsjkdsd"))
11 # ['ajs', 'hs', 'jkds']
12 
13 print(re.findall("w*?s", "ajshsjkdsd"))
14 # ['ajs', 'hs', 'jkds']

常用符号

 1 # 贪婪匹配  *  +    不是固定的特殊符号  只是一种现象
 2 # 会一直匹配到不满足条件为止 用问号来阻止贪婪匹配(匹配最少满足条件的字符数)
 3 
 4 print(re.findall("w+?", "ajshsjkdsd"))
 5 # ['a', 'j', 's', 'h', 's', 'j', 'k', 'd', 's', 'd']
 6 
 7 print(re.findall("w*?", "ajshsjkdsd"))
 8 # ['', '', '', '', '', '', '', '', '', '', '']
 9 
10 print(re.findall("w+?s", "ajshsjkdsd"))
11 # ['ajs', 'hs', 'jkds']
12 
13 print(re.findall("w*?s", "ajshsjkdsd"))
14 # ['ajs', 'hs', 'jkds']

3、re模块的常用方法

(1).findall 　　从左往右查找所有满足条件的字符返回一个列表
(2).search　　返回第一个匹配的字符串，结果封装为对象
(3).match（不常用）　　匹配行首, 返回值与search相同
(4).compile（不常用）　将正则表达式封装为一个正则对象，可以重复使用这个表达式

 1 import re
 2 
 3 print(re.findall('w', src))
 4 # ['a', 'b', 'c', '_', 'd', '1', '2', '3', 'd', 'd', '5', 's', 'd']
 5 
 6 print(re.search('hello','weqwe hello dddd helllo dd'))
 7 # <_sre.SRE_Match object; span=(6, 11), match='hello'>
 8 
 9 print(re.match("hello"," world hello python"))
10 # None

4、分组

分组是从左边第一个左括号起，，index逐步增加，下面的1-4就是res=re.match(r"((a(b)c)(def))","abcdef")

 1 ts = "abcdef"
 2 reg = r"((a(b)c)(def))"
 3 regex = re.compile(reg)
 4 res = regex.match(ts)
 5 print(res)
 6 print(res.span()) # 匹配的结果的区间
 7 print(res.group(0)) # abcdef
 8 print(res.group(1)) # 1 -> 第一个()   abcdef
 9 print(res.group(2)) # abc
10 print(res.group(3)) # b
11 print(res.group(4)) # def
12 print(res.groups()) # ('abcdef','abc','b','def')