正则表达式与三剑客

第十二课 正则表达式

一、正则介绍

二、grep

三、Sed

四、awk

五、扩展


一、正则表达式介绍

正则表达式(Regular Express,RE)是一种字符模式,用于在查找过程中匹配指定的字符。

元字符是这样一类字符,它们表达的是不同于字面本身的含义。正则表达式的元字符由各种执行模式匹配操作的程序来解析,如:vi、grep、sed和awk等。

能被UNIX/Linux上所有的模式匹配工具识别的基本元字符

元字符 功能 示例 匹配对象
^ 行首定位符 /^love/ 匹配所有以love开头的行
$ 行尾定位符 /love$/ 匹配所有以love结尾的行
. 匹配单个字符 /l..e/ 匹配包含一个l,后面跟两字符,再跟一个e的行
* 匹配0个或多个重复的位于*号前的字符 / *love/ 匹配包含跟在0个或多个空格后的模式love行
[] 匹配一组字符中任一个 /[Ll]ove/ 匹配包含love或Love的行
[x-y] 匹配指定范围内的一个字符 /[A-Z]ove/ 匹配大写字母后面跟着ove的字符
[^] 匹配不在指定组内的字符 /[^A-Z]/ 匹配不在范围A-Z之间的任意一个字符
\ 用来转义元字符 /love./ 匹配包含love,后面跟一个句点。

扩展元字符,使用RE元字符的UNIX/Linux程序支持(不一定所有的模式匹配工具都支持)

column column column column
< 词首定位符 /<love/ 匹配包含以love开头的词的行
> 词尾定位符 /love>/ 匹配包含以love结尾的词的行
\(..\) 匹配稍后将要使用的字符的标签 /(love) able \1er/ 最多9个可用标签。模式中最左边的是第一个。左例中模式love被保存为标签1,用\1表示
x\{m\}或x\{m,\} 或x\{m,n\} 字符x的重复出现:m次,至少m次,至少m次且不超过n次 o\{5,10\} 匹配包含5~10个连续的字母o的行

基本元字符示例文件

//,以grep程序演示
root@lanquark:~/unixshellbysample/chap03# cat picnic 
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love
is forever. I live for you. It's hard to get back in the
groove.

简单正则表达式查找

root@lanquark:~/demo# grep 'love' picnic 
I had a lovely time on our little picnic.
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love

行首定位符

root@lanquark:~/demo# grep '^love' picnic 
love, how much I adore you. Do you know

行尾定位符

root@lanquark:~/demo# grep 'love$' picnic 
clover. Did you see them?  I can only hope love

任意单个字符(.

root@lanquark:~/demo# grep 'l.ve' picnic 
I had a lovely time on our little picnic.
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love
is forever. I live for you. It's hard to get back in the

零个或多个前字符(*)

root@lanquark:~/demo# grep 'o*ve' picnic 
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love
is forever. I live for you. It's hard to get back in the
groove.

一组字符([ ])

root@lanquark:~/demo# grep '[Ll]ove' picnic 
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love

一个字符范围([ - ])

root@lanquark:~/demo# grep 'ove[a-z]' picnic 
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love

不在组内的字符([^ ])

root@lanquark:~/demo# grep 'ove[^a-zA-Z0-9]' picnic 
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
groove.

扩展元字符演示文件

//以grep或sed程序演示
root@lanquark:~/unixshellbysample/chap03# cat textfile 
Unusual occurrences happened at the fair.
Patty won fourth place in the 50 yard dash square and fair.
Occurrences like this are rare.
The winning ticket is 55222.
The ticket I got is 54333 and Dee got 55544.
Guy fell down while running around the south bend in his last event.

词首定位符(\<)和词尾定位符(\>)

root@lanquark:~/demo# grep '\<fourth\>' textfile 
Patty won fourth place in the 50 yard dash square and fair.

用\(和\)记录模式

//occurrence替换成occurence或Occurrence替换成Occurence
root@lanquark:~/unixshellbysample/chap03# sed 's#\([Oo]ccur\)rence#\1enece#' textfile 
Unusual occureneces happened at the fair.
Patty won fourth place in the 50 yard dash square and fair.
Occureneces like this are rare.
The winning ticket is 55222.
The ticket I got is 54333 and Dee got 55544.
Guy fell down while running around the south bend in his last event.


二、grep

grep表示全局查找正则表达式并打印结果行。

grep不会对输入文件进行任何修改或变化

命令格式

grep word filename

root@lanquark:~# grep hjm /etc/passwd
hjm:x:5000:5000:hjm:/home/hjm:/bin/bash

grep使用的正则表达式元字符

元字符 功能 示例 匹配对象
^ 行首定位符 '^love' 匹配所有以love开头的行
$ 行尾定位符 'love$' 匹配所有以love结尾的行
. 匹配单个字符 'l..e' 匹配包含一个l,后面跟两字符,再跟一个e的行
* 匹配0个或多个重复的位于*号前的字符 ' *love' 匹配包含跟在0个或多个空格后的模式love行
[ ] 匹配一组字符中任一个 '[Ll]ove' 匹配包含love或Love的行
[^] 匹配不在指定组内的字符 '[^A-K]' 匹配不在范围A-Z之间的任意一个字符
\ 用来转义元字符 'love.' 匹配包含love,后面跟一个句点。
< 词首定位符 '<love' 匹配包含以love开头的词的行
> 词尾定位符 'love>/' 匹配包含以love结尾的词的行
\(..\) 匹配稍后将要使用的字符的标签 '(love)ing' 最多9个可用标签。模式中最左边的是第一个。左例中模式love被保存为标签1,用\1表示
x\{m\}或x\{m,\} 或x\{m,n\} 字符x的重复出现:m次,至少m次,至少m次且不超过n次 o\{5,10\} 匹配包含5~10个连续的字母o的行
//演示文件
root@lanquark:~/demo# cat datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

打印所有包含NW的行

root@lanquark:~/demo# grep NW datafile 
northwest	NW	Charles Main		3.0	.98	3	34

打印以字母n开头的行

root@lanquark:~/demo# grep '^n' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9

打印以数字4结尾的行

root@lanquark:~/demo# grep '4$' datafile 
northwest	NW	Charles Main		3.0	.98	3	34

打印以字母w或e开头的行

root@lanquark:~/demo# grep '^[we]' datafile 
western		WE	Sharon Gray		5.3	.97	5	23
eastern		EA	TB Savage		4.4	.84	5	20

打印包含非数字的所有行

root@lanquark:~/demo# grep '^[we]' datafile 
western		WE	Sharon Gray		5.3	.97	5	23
eastern		EA	TB Savage		4.4	.84	5	20
root@lanquark:~/demo# grep '[^0-9]' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

打印所有包含一个s,后跟0个或多个连着的s和一个空格的文本行。

root@lanquark:~/demo# grep 'ss* ' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
southwest	SW	Lewis Dalsass		2.7	.8	2	18

打印至少9个小写字母连在一起的行

root@lanquark:~/demo# grep '[a-z]\{9\}' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
northeast 	NE	AM Main Jr.		5.1	.94	3	13

打印包含一个3后面跟一个句点和一个数字,再任意多个字符,然后跟一个3

root@lanquark:~/demo# grep '\(3\)\.[0-9].*\1' datafile 
northwest	NW	Charles Main		3.0	.98	3	34

打印所有包含以north开头的单词的行

root@lanquark:~/demo# grep '\<north' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
root@lanquark:~/demo# grep '\<north\>' datafile 
north		NO	Margot Weber		4.5	.89	5	 9

常用grep选项

选项 功能
-c 显示匹配到的行的数目,而不是显示行的内容
-i 比较字符时忽略大小写
-l 只列出匹配行所在的文件的文件名
-n 在每一行前面加上它在文件中的相对行号
-v 反向查找,只显示不匹配的行
-w 把表达式做为词来查,就好像被<和>所包含一样
-A 匹配到模式所在行的后两行
-B 匹配到模式行所在行的前两行
-C 匹配到模式所在行的前后两行
-R 对列出的目录,递归的读取并处理这些目录中的所有文件,也就是指该下目录下的所有目录

示例文件

root@lanquark:~/demo# cat datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

-c选项打印以south开头的单的数量

root@lanquark:~/demo# grep -c '^south' datafile 
3

-i选项忽略大小

root@lanquark:~/demo# grep -i 'pat' datafile 
southeast 	SE	Patricia Hemenway	4.0	.7	4	17

-l选项只显示包含模式的文件名而不输出文本

root@lanquark:~/demo# grep -l 'SE' *
datafile
temp

-n选项在找到指定模式的行前面加上其行号

root@lanquark:~/demo# grep -n '^south' datafile 
3:southwest	SW	Lewis Dalsass		2.7	.8	2	18
4:southern	SO	Suan Chin		5.1	.95	4	15
5:southeast 	SE	Patricia Hemenway	4.0	.7	4	17

-v表示取反

root@lanquark:~/demo# grep -v 'Suan Chin' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

-w只查找作为一个词,而不是词的一部分出现的模式。

root@lanquark:~/demo# grep -w 'north' datafile 
north		NO	Margot Weber		4.5	.89	5	 9

-A选项打印匹配到模式所在行的后两行

root@lanquark:~/demo# grep -A 2 'NE' datafile 
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

-B选项打印匹配到模式所在行的前两行

root@lanquark:~/demo# grep -B 2 'NE' datafile 
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13

-C选项打印匹配到模式所在行的前后两行

root@lanquark:~/demo# grep -C 2 'NE' datafile 
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

-R 递归查找模式

root@lanquark:~/demo# grep -R 'central' *
datafile:central		CT 	Ann Stephens		5.7	.94	5	13
test.dir/datafile:central		CT 	Ann Stephens		5.7	.94	5	13

grep的退出状态

grep在脚本中很有用,它总会返回一个退出状态。退出状态为0,表示检索到模式,退出状态为1表示找不到模式,退出状态为2表示找不到要搜索的文件。

grep的输入可以是文件和管道

//取目录中的文件
root@lanquark:~/demo# ls -l | grep '^-'
-rw-r--r--  1 root root     5 Jun  1 04:30 1111
-rw-r--r--  1 root root  1066 May 31 20:56 1.txt
-rw-r--r--  1 root root   351 Jun  4 23:04 datafile
-rw-r--r--  1 root root    18 Jun  4 21:54 id.txt
-rw-r--r--  1 root root   876 May 31 21:05 ipconfig.txt
-rw-r--r--  1 root root   338 Jun  4 23:09 picnic
-rw-r--r--+ 1 root root 18065 May 24 21:00 temp
-rw-r--r--  1 root root     0 Jun  1 04:25 test1.txt
-rw-r--r--  1 root root   277 Jun  4 23:17 textfile
-rw-r--r--+ 1 root root   572 Jun  1 04:29 tt.txt


扩展的grep: Egrep

调用方式: egrep 或 grep -E

egrep的正则表达式元字符

元字符 功能 示例 匹配对象
^ 行首定位符 '^love' 匹配所有以love开头的行
$ 行尾定位符 'love$' 匹配所有以love结尾的行
. 匹配单个字符 'l..e' 匹配包含一个l,后面跟两字符,再跟一个e的行
* 匹配0个或多个重复的位于*号前的字符 ' *love' 匹配包含跟在0个或多个空格后的模式love行
[ ] 匹配一组字符中任一个 '[Ll]ove' 匹配包含love或Love的行
[^] 匹配不在指定组内的字符 '[^A-K]' 匹配不在范围A-Z之间的任意一个字符
+ 匹配一个或多个加号前的字符 '[a-z]+ove' 匹配一个或多个小写字母后跟ove的字符串
匹配0个或1个前导字符 'lo?ve' 匹配l后跟一个或0个字母o以及ve的字符串。
a|b 行尾定位符 'love|hate' 匹配love或hate两上表达式之一
() 字符组 'love(able|ly)(ve)+' 匹配lovable或lovely,匹配ov的一次或多次出现

示例文件

root@lanquark:~/demo# cat datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

打印包含NW或EA的行

root@lanquark:~/demo# egrep 'NW|EA' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
eastern		EA	TB Savage		4.4	.84	5	20

打印所有包含一个或多个数字3的行

root@lanquark:~/demo# egrep '3+' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
northeast 	NE	AM Main Jr.		5.1	.94	3	13
central		CT 	Ann Stephens		5.7	.94	5	13

打印所有包含数字2,后面跟零个或一个句点,再跟数字的行。

root@lanquark:~/demo# egrep '2\.?[0-9]' datafile 
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
eastern		EA	TB Savage		4.4	.84	5	20

打印连续出现一个或多个模式no的行

root@lanquark:~/demo# egrep '(no)+' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9

打印所有包含字母S,后跟h或u的行

root@lanquark:~/demo# egrep 'S(h|u)' datafile 
western		WE	Sharon Gray		5.3	.97	5	23
southern	SO	Suan Chin		5.1	.95	4	15


三、sed

sed是一种新型的,非交互式的编辑器。它不会修改原文件。

sed编辑器逐行处理文件(或输入),并将输出结果发送到屏幕。sed把正在处理的行保存在一个临时缓冲区。sed处理完模式空间中的行后,就把该行发送到屏幕。sed处理完一行就将其从模式空间删除,然后将下一行读入空间。

sed的命令与选项

命令 功能
a\ 在当前行后添加一行或多行
c\ 用新文本修改(替换)当前行中的文件
d 删除行
i\ 在当前行前插入文本
h 把模式空间里的内容复制到暂存缓冲区
H 把模式空间里的内容追加到暂存缓冲区
g 取出暂存缓冲区的内容,并将其复制到模式空间,覆盖该处原有内容
G 取出暂存缓冲区的内容,并将其复制到模式空间,追加在原有内容后面。
l 列出非打印字符
p 打印行
n 读入下一输入行,并从下一条命令而不是第一条命令开始对其处理
q 结束或退出sed
r 从文件中读取行
! 对所选行以外的所有行应用命令
s 用一个字符串替换另一个
替换标志
g 在行内进行全局替换
p 打印行
w 将行写入文件
x 交换暂存缓冲区与模式空间的内容
y 将字符转换为另一个字符(不能对正则表达式使用y)

sed选项

选项 功能
-e 允许多项编辑
-f 指定sed脚本文件名
-n 取消默认的输出

sed元字符

元字符 功能 示例 匹配对象
^ 行首定位符 /^love/ 匹配所有以love开头的行
$ 行尾定位符 /love$/ 匹配所有以love结尾的行
. 匹配单个字符 /l..e/ 匹配包含一个l,后面跟两字符,再跟一个e的行
* 匹配0个或多个重复的位于*号前的字符 / *love/ 匹配包含跟在0个或多个空格后的模式love行
[ ] 匹配一组字符中任一个 /[Ll]ove/ 匹配包含love或Love的行
[^] 匹配不在指定组内的字符 /[^A-KM-Z]/ 匹配包含ove,但ove之前的那个字符不在A-K或M-Z之间的行
\(..\) 保存已匹配的字符 s/\(love\)able/\1er 标记元字符之间的模式,并将其保存为标签1,之后可以用\1来引用它。最多可以定义9个标签。从左边开始编号。
& 保存查找串以便在替换串中引用 s/love/aa&aa 字符&代表查找串,字符串love将替换前后各加了两个aa,即love变成aaloveaa
< 词首定位符 /<love/ 匹配包含以love开头的单词的行
> 词尾定位符 /love>/ 匹配包含以love结尾的单词的行
x\{m\} 连续m个x /o\{5\}/ 匹配出现连续5个o
x\{m,\} 至少m个x /o\{5,\}/ 匹配至少5个连续o
x\{m,n\} 至少5个x,但不超过n个x /\{5,10\}/ 匹配最少5个,最多10个o

示例文件

root@lanquark:~/demo# cat datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

打印命令p

root@lanquark:~/demo# sed '/north/p' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

取消默认默认输出-n

root@lanquark:~/demo# sed -n '/north/p' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9

删除:d命令

//删除第3行
root@lanquark:~/demo# sed '3d' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

//删除第3行到最后一行
root@lanquark:~/demo# sed '3,$d' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23

//删除最后一行
root@lanquark:~/demo# sed '$d' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9

//删除包含模式north的行
root@lanquark:~/demo# sed '/north/d' datafile 
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
central		CT 	Ann Stephens		5.7	.94	5	13

替换命令:s

//将west替换为north,g表示全局替换
root@lanquark:~/demo# sed 's#west#north#g' datafile 
northnorth	NW	Charles Main		3.0	.98	3	34
northern		WE	Sharon Gray		5.3	.97	5	23
southnorth	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

//&代表匹配内容
root@lanquark:~/demo# sed 's#[0-9][0-9]$#&.5#' datafile 
northwest	NW	Charles Main		3.0	.98	3	34.5
western		WE	Sharon Gray		5.3	.97	5	23.5
southwest	SW	Lewis Dalsass		2.7	.8	2	18.5
southern	SO	Suan Chin		5.1	.95	4	15.5
southeast 	SE	Patricia Hemenway	4.0	.7	4	17.5
eastern		EA	TB Savage		4.4	.84	5	20.5
northeast 	NE	AM Main Jr.		5.1	.94	3	13.5
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13.5

//取消默认输出,只有发生变化的行才打印
root@lanquark:~/demo# sed -n 's#Hemenway#Jones#gp' datafile 
southeast 	SE	Patricia Jones	4.0	.7	4	17

//保存已匹配的字符()
root@lanquark:~/demo# sed -n 's#\(Mar\)got#\1iance#p' datafile 
north		NO	Mariance Weber		4.5	.89	5	 9

指定行的范围:逗号

//正则表达式确定匹配行的范围
root@lanquark:~/demo# sed -n '/west/,/east/p' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17

//数字和正则表达式确定匹配行的范围
root@lanquark:~/demo# sed -n '5,/^northeast/p' datafile 
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13

//以数字确定匹配行的范围
root@lanquark:~/demo# sed -n '1,4p' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15

多重编辑:e命令

root@lanquark:~/demo# sed -e '1,3d' -e 's#Hemenway#Jones#' datafile 
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Jones	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

读文件:r命令

root@lanquark:~/demo# cat newfile 
	______________________________________
	| *** SUAN HAS LEFT THE COMPANY ***  |
	|____________________________________|

root@lanquark:~/demo# sed '/Suan/r newfile' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
	______________________________________
	| *** SUAN HAS LEFT THE COMPANY ***  |
	|____________________________________|
	southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

写文件:w命令

root@lanquark:~/demo# sed -n '/north/w newfile1' datafile 
root@lanquark:~/demo# cat newfile1
northwest	NW	Charles Main		3.0	.98	3	34
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9

追加:命令a

root@lanquark:~/demo# sed '/^north/a\--->THE NORTH SALES DISTRICT HAS MOVED<---' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
--->THE NORTH SALES DISTRICT HAS MOVED<---
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
--->THE NORTH SALES DISTRICT HAS MOVED<---
north		NO	Margot Weber		4.5	.89	5	 9
--->THE NORTH SALES DISTRICT HAS MOVED<---
central		CT 	Ann Stephens		5.7	.94	5	13

插入:i命令

root@lanquark:~/demo# sed '/eastern/i\--->NEW ENGLIST REGION<---' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
--->NEW ENGLIST REGION<---
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

修改:c命令

root@lanquark:~/demo# sed '/eastern/c\THE EASTERN REGION HAS TEMPORARLLY CLOSED' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
THE EASTERN REGION HAS TEMPORARLLY CLOSED
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

获取下一行:n命令

root@lanquark:~/demo# sed -n '/eastern/{n;s#AM#Archie#p;}' datafile 
northeast 	NE	Archie Main Jr.		5.1	.94	3	13

转换:y命令

root@lanquark:~/demo# sed '1,3y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' datafile 
NORTHWEST	NW	CHARLES MAIN		3.0	.98	3	34
WESTERN		WE	SHARON GRAY		5.3	.97	5	23
SOUTHWEST	SW	LEWIS DALSASS		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13

退出:q命令

//打印完第5行退出
root@lanquark:~/demo# sed '5q' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17

//匹配到模式时,先替换再退出
root@lanquark:~/demo# sed '/Lewis/{s#Lewis#Joseph#;q;}' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Joseph Dalsass		2.7	.8	2	18

暂存和取用:h命令和g命令

//WE行打印2次,G是追加
root@lanquark:~/demo# sed -e '/northeast/h' -e '$G' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
→northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13
→northeast 	NE	AM Main Jr.		5.1	.94	3	13

//WE行只打印一次
root@lanquark:~/demo# sed -e '/WE/{h;d;}' -e '/CT/{G;}' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
central		CT 	Ann Stephens		5.7	.94	5	13
→western		WE	Sharon Gray		5.3	.97	5	23

//g是覆盖
root@lanquark:~/demo# sed -e '/WE/{h;d;}' -e '/CT/{g;}' datafile 
northwest	NW	Charles Main		3.0	.98	3	34
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
north		NO	Margot Weber		4.5	.89	5	 9
western		WE	Sharon Gray		5.3	.97	5	23

暂存和互换

//x表示互换
root@lanquark:~/demo# sed -e '/Patricia/h' -e /Margot/x datafile 
northwest	NW	Charles Main		3.0	.98	3	34
western		WE	Sharon Gray		5.3	.97	5	23
southwest	SW	Lewis Dalsass		2.7	.8	2	18
southern	SO	Suan Chin		5.1	.95	4	15
southeast 	SE	Patricia Hemenway	4.0	.7	4	17
eastern		EA	TB Savage		4.4	.84	5	20
northeast 	NE	AM Main Jr.		5.1	.94	3	13
→southeast 	SE	Patricia Hemenway	4.0	.7	4	17
central		CT 	Ann Stephens		5.7	.94	5	13


四 awk

awk是一种用于处理数据和生成报告的UNIX编程语言,gawk是基于Linux的GNU版本。

awk的格式:awk指令由模式、操作、或模式与操作的组合组成。

awk可以接受来自文件、管道或标准输入的输入。

1.从文件输入

格式:
awk 'pattern' filename
awk '{action}' filename
awk 'pattern{action}' filename

//示例文件
[root@lanquark demo]# cat employees
Tom Jones	4424	5/12/66	54335
Mary Adams	5346	11/4/63	28765
Sally Chang	1654	7/22/54	65000
Billy Black	1683	9/23/44	33650

//仅有模式
[root@lanquark demo]# awk '/Mary/' employees
Mary Adams	5346	11/4/63	28765

//仅有动作
[root@lanquark demo]# awk '{print $1}' employees
Tom
Mary
Sally
Billy

//模式和动作的组合
[root@lanquark demo]# awk '/Sally/{print $1,$2}' employees
Sally Chang

2.从命令输入

格式

command | awk 'pattern'
command | awk '{action}'
command | awk 'pattern{action}'

//仅有模式
[root@lanquark demo]# cat employees | awk '/Mary/'
Mary Adams	5346	11/4/63	28765

//有模式有动作
[root@lanquark demo]# cat employees | awk '/Mary/{print $1,$2}'
Mary Adams

awk的正则表达式元字符

元字符 说明
^ 在行首匹配
$ 在行尾匹配
. 匹配单个任意字符
* 匹配零个或多个前导字符
+ 匹配1个或多个前导字符
? 匹配0个或1个前导字符
[ABC] 匹配指定字符组(即A、B和C)中的字符
[^ABC] 匹配任何一个不在指定字符组(即A、B和C)中的字符
[A-Z] 匹配A至Z之间的任一字符
A|N 匹配A或B
(AB)+ 匹配一个AB或多个AB组合,如AB,ABAB,ABABAB
\* 匹配星号本身
& 用在替代串中,代表查找串中匹配到的内容

示例文件

[root@lanquark demo]# cat datafile1
northwest	NW	Joel Craig	3.0	.98	3	4
western	WE	Sharon Kelly	5.3	.97	5	23
southwest	SW	Chris Foster	2.7	.8	2	18
southern	SO	May Chin	5.1	.95	4	15
southeast	SE	Derek Johnson	4.0	.7	4	17
eastern	EA	Susan Beal	4.4	.84	5	20
northeast	NE	TJ Nichols	5.1	.94	3	13
north	NO	Val Shultz	4.5	.89	5	9
central	CT	Sheri Watson	5.7	.94	5	13

简单模式匹配

[root@lanquark demo]#  awk '/west/' datafile1
northwest	NW	Joel Craig	3.0	.98	3	4
western	WE	Sharon Kelly	5.3	.97	5	23
southwest	SW	Chris Foster	2.7	.8	2	18


匹配行首(^)

[root@lanquark demo]# awk '/^north/' datafile1
northwest	NW	Joel Craig	3.0	.98	3	4
northeast	NE	TJ Nichols	5.1	.94	3	13
north	NO	Val Shultz	4.5	.89	5	9

匹配模式no或so(|)

[root@lanquark demo]# awk '/^(no|so)/' datafile1
northwest	NW	Joel Craig	3.0	.98	3	4
southwest	SW	Chris Foster	2.7	.8	2	18
southern	SO	May Chin	5.1	.95	4	15
southeast	SE	Derek Johnson	4.0	.7	4	17
northeast	NE	TJ Nichols	5.1	.94	3	13
north	NO	Val Shultz	4.5	.89	5	9

简单的操作

[root@lanquark demo]# awk '{print $3,$2}' datafile1
Joel NW
Sharon WE
Chris SW
May SO
Derek SE
Susan EA
TJ NE
Val NO
Sheri CT

[root@lanquark demo]# awk '{print "number of fields:",NF}' datafile1
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8

模式与操作组合的正则表达式

[root@lanquark demo]# awk '/northeast/{print $3,$2}' datafile1
TJ NE

[root@lanquark demo]# awk '/^[ns]/{print $1}' datafile 
[root@lanquark demo]# awk '/^[ns]/{print $1}' datafile1
northwest
southwest
southern
southeast
northeast
north

匹配模式(~)

[root@lanquark demo]# awk '$5~/\.[7-9]+/' datafile
southwest	SW	Lewis Dalsass		2.7	.8	2	18
central		CT 	Ann Stephens		5.7	.94	5	13

输入字段分隔符(F)

//未指定分隔符,默认是以空格
[root@lanquark demo]# head -n 5 /etc/passwd | awk '{print $1}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

//指定分隔符为:号
[root@lanquark demo]# head -n 5 /etc/passwd | awk -F: '{print $1}'
root
bin
daemon
adm
lp

比较表达式

关系运算符

运算符 含义 示例
< 小于 x < y
<= 小于或等于 x <= y
== 等于 x == y
!= 不等于 x != y
>= 大于或等于 x >= y
> 大于 x > y
~ 与正则表达式匹配 x ~ /y/
!~ 与正则表达式不匹配 x !~ /y/

示例文件

[root@lanquark demo]# cat employees 
Tom Jones	4424	5/12/66	54335
Mary Adams	5346	11/4/63	28765
Sally Chang	1654	7/22/54	65000
Billy Black	1683	9/23/44	33650

[root@lanquark demo]# awk '$3 == 5346' employees 
Mary Adams	5346	11/4/63	28765

[root@lanquark demo]# awk '$3>5000{print $1}' employees 
Mary

[root@lanquark demo]# awk '$2~/Adam/' employees 
Mary Adams	5346	11/4/63	28765

[root@lanquark demo]# awk '$2!~/Adam/' employees 
Tom Jones	4424	5/12/66	54335
Sally Chang	1654	7/22/54	65000
Billy Black	1683	9/23/44	33650

算术运算

算术运算符

运算符 含义 示例
+ x + y
- x - y
* x * y
/ x / y
% x % y
^ x ^ y
[root@lanquark demo]# cat emp.data 
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

[root@lanquark demo]# awk '$3>0{print $2*$3}' emp.data 
40
100
121
76.5

逻辑运算符和复合运算符

运算符 含义 示例
&& 逻辑与 a&&b
|| 逻辑或 a||b
逻辑非 !a
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

[root@lanquark demo]# awk '$3>10 && $3<22' emp.data
Mark 5.00 20
Susie 4.25 18

赋值运算符

[root@lanquark demo]# awk '$3=="Chris"{$3="Christian";print}' datafile1
southwest SW Christian Foster 2.7 .8 2 18

内置变量

变量名 含义
ARGC 命令行参数数目
ARGIND 命令行中当前文件在ARGV内的索引
ARGV 命令参数构成的数组
CONVFMT 数字转换格式,默认为%.6g
ENVIRON 包含当前shell环境变量值的数组
ERRNO 当使用getline函数进行读操作或使用cloase函数时,因重定向操作而生产的系统错误
FIELDWIDTHS 在分隔固定宽度的列表时,使用空白而不是FS进行分隔的字段宽度列表
FILENAME 当前输入文件的文件名
FNR 当前文件的记录数
FS 输入字段分隔符,默认为空格
IGNORECASE 在正则表达式和字符串匹配中不区分大小写
NF 当前记录中的字段数
NR 目前的记录数
OFMT 数字的输出格式
OFS 输出字段分隔符
ORS 输出记录分隔符
RLENGTH match函数匹配到的字符串的长度
RS 输入记录分隔符
RSTART match函数匹配到的字符串的偏移量
RT 记录终结符,对于匹配字符或者用RS指定的regex,gawk将RT设置到输入文本
SUBSEP 数组下标分隔符
[root@lanquark demo]# cat employees2
Tom Jones:4424:5/12/66:54335
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:65000
Billy Black:1683:9/23/44:33650

[root@lanquark demo]# awk -F: '$1=="Mary Adams"{print NR,$1,$2,$NF}' employees2
2 Mary Adams 5346 28765

[root@lanquark demo]# awk -F: 'BEGIN{IGNORECASE=1};$1=="mary adams"{print NR,$1,$2,$NF}' employees2
2 Mary Adams 5346 28765

BEGIN模式

[root@lanquark demo]# awk 'BEGIN{FS=":";OFS="\t";ORS="\n\n"}{print $1,$2,$3}' employees2
Tom Jones	4424	5/12/66

Mary Adams	5346	11/4/63

Sally Chang	1654	7/22/54

Billy Black	1683	9/23/44


[root@lanquark demo]# awk 'BEGIN{print "Make Year"}'
Make Year

END模式

[root@lanquark demo]# awk 'END{print "The number of records is",NR}' employees2
The number of records is 4

[root@lanquark demo]# awk '/Mary/{count++}END{print "Mary was found",count,"times"}' employees2
Mary was found 1 times

重定向和管道

输出重定向(>清空 >>追加,不清空)

[root@lanquark demo]# awk '$1=="Tom"{print $1}' employees2
Tom
[root@lanquark demo]# awk '$1=="Tom"{print $1>"passing_file"}' employees2
[root@lanquark demo]# cat passing_file 
Tom

输入重定向(getline)

[root@lanquark demo]# awk 'BEGIN{"date"|getline d;print d}'
Tue Jun  5 22:53:24 EDT 2018

[root@lanquark demo]# awk 'BEGIN{"date" | getline d;split(d,mon);print mon[2]}' 
Jun

[root@lanquark demo]# awk 'BEGIN{while("ls" | getline) print}'
1111
1.txt
datafile
datafile1
emp.data
employees
employees2
id.txt
ipconfig.txt
lab5.data
names
newfile
newfile1
passing_file
picnic
temp
test1.txt
test.dir
textfile
tt.txt

管道

如果在awk中打开了管道,就必须先关闭它才能打开另一个管道。管道符右边的命令被括在双引号中。

[root@lanquark demo]# cat names 
john smith 
alice cheba 
george goldberg 
susan goldberg 
tony tram 
barbara nguyen 
elizabeth lone 
dan savage 
eliza goldberg 
john goldenrod
[root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}' names
tony tram
john smith
dan savage
barbara nguyen
elizabeth lone
john goldenrod
susan goldberg
george goldberg
eliza goldberg
alice cheba

//关闭管道
[root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}END{print "game over"}' names
game over
tony tram
john smith
dan savage
barbara nguyen
elizabeth lone
john goldenrod
susan goldberg
george goldberg
eliza goldberg
alice cheba
[root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}END{close("sort -r +1 -2 +0 -1");print "game over"}' names
tony tram
john smith
dan savage
barbara nguyen
elizabeth lone
john goldenrod
susan goldberg
george goldberg
eliza goldberg
alice cheba
game over


五、扩展

递归过滤:

如在data目录下,过滤所有*.php文档中含有eval的行

grep -r --include="*.php" 'eval' /data/

练习

http://www.apelearn.com/study_v2/chapter14.html

原文地址:https://www.cnblogs.com/minn/p/9138351.html