三剑客 -- grep

Linux grep 命令

Linux grep 命令用于查找文件里符合条件的字符串。

grep 指令用于查找内容包含指定的范本样式的文件，如果发现某文件的内容符合所指定的范本样式，预设 grep 指令会把含有范本样式的那一列显示出来。若不指定任何文件名称，或是所给予的文件名为 -，则 grep 指令会从标准输入设备读取数据

grep命令主要用于过滤文本，grep家族如下

grep： 在文件中全局查找指定的正则表达式，并打印所有包含该表达式的行
egrep：扩展的egrep，支持更多的正则表达式元字符
fgrep：固定grep(fixed grep)，有时也被称作快速(fast grep)，它按字面解释所有的字符

grep命令格式如下

grep [选项] PATTERN 文件1 文件2 ...

[root@egon ~]# grep 'root' /etc/passwd
[root@egon ~]# fgrep 'bash' /etc/passwd

找到：				grep返回的退出状态为0
没找到：			grep返回的退出状态为1
找不到指定文件：	  grep返回的退出状态为2

grep 命令的输入可以来自标准输入或管道，而不仅仅是文件，例如：

ps aux |grep 'nginx'

#实际上 grep sed awk 都支持管道

参数

grep 选项 '正则表达式' 文件路径

-c	:统计行数，count
-n	:显示行号
-w	:精确匹配单词，只显示全字符符合的列,word
-o	:精确匹配，只显示匹配的内容
-v	:取反，只显示不匹配的行
-r	:递归过滤目录，等于 -R
-l	:如果匹配成功，只显示文件名
-q	:静默输出，可以使用 $?查看命令是否成功执行
-i	:忽略大小写
-A	:显示过滤行，和以下n行
-B	:显示过滤行，和以上n行
-C	:显示过滤行，和以上下各n行
-e	:识别正则
-E	:识别扩展正则，等于egrep

示例

# grep
[root@hass-11 ~]# grep "root" /etc/passwd
root:x:0:0:root:/root:/bin/sh
[root@hass-11 ~]# grep "^root" /etc/passwd
root:x:0:0:root:/root:/bin/sh

# -n
[root@hass-11 ~]# grep -n "root" /etc/passwd
1:root:x:0:0:root:/root:/bin/sh

# -w 过滤单词，以空格或特殊符号为分隔符
[root@hass-11 ~]# grep -w "root" /etc/passwd
root:x:0:0:root:/root:/bin/sh

# -c
[root@hass-11 ~]# grep -c "root" /etc/passwd
2

# -v
[root@hass-11 ~]# grep -v "yy" /etc/passwd

# -o
[root@hass-11 ~]# grep -o "yy" /etc/passwd
yy
yy

# -r -l
[root@hass-11 ~]# grep -r "root" /etc/
/etc/grub.d/00_header:datarootdir="/usr/share"
[root@hass-11 ~]# grep -rl "root" /etc/
/etc/grub.d/00_header
[root@hass-11 ~]# grep -l "root" /etc/passwd /etc/group
/etc/passwd
/etc/group

# -E，+号表示左边的字符出现一次或者无数次
[root@hass-11 ~]# grep -E "^root|yy" /etc/passwd
root:x:0:0:root:/root:/bin/sh
syy:x:1000:1000::/home/syy:/bin/sh
[root@hass-11 ~]# grep -E "[0-9]+" /etc/passwd
[root@hass-11 ~]# egrep "[0-9]+" /etc/passwd
[root@hass-11 ~]# echo this is a test line. | grep -o -E "[a-z]+."
line.
[root@hass-11 ~]# echo this is a test line. | grep -o -E "[a-z]."
e.

# -C
[root@hass-11 ~]# grep -C2 "yy" /etc/passwd

# ^ ,只匹配每个单词的前4个字符
[root@hass-11 ~]# grep "^root" /etc/passwd
root:x:0:0:root:/root:/bin/sh

# $ ,只匹配每个单词的后2个字符
[root@hass-11 ~]# grep "sh$" /etc/passwd
root:x:0:0:root:/root:/bin/sh

#高亮
[root@hass-11 ~]# alias grep
alias grep='grep --color=auto'

#find与grep混用
find / -type f -name "*.log" | xargs grep "ERROR"

#搜索多个文件
grep "match_pattern" file_1 file_2 file_3 ...

指定/排除文件

涉及选项：
（1）--include：指定需要搜索的文件，目录，可以与 -r混用
（2）--exclude：排除需要搜索的文件，目录，可以与 -r混用
（3）--exclude-dir：排除需要搜索的目录

例子：

（1）搜索src目录中.c和.cpp文件中的含有main的行：
grep -r "main" ./src --include *.{c,cpp}

（2）搜索src目录中含有main的行，但不搜索readme文件：
grep -r "main" ./src --exclude "readme"

（3）搜索src目录中含有main的行，但不搜索.git文件夹：
grep -r "main" ./src --exclude-dir ".git"

特殊符号

元字符：键盘上的特殊符号被bash解释器解析，拥有特殊意义的字符
通配符：通配符是元字符的一种，通配符在不同的命令中表达的意思不一样
基础正则：键盘上的特殊符号被命令解析，^ $ []
扩展正则：键盘上的特殊符号被命令解析, | + ?

shell元字符(也称为通配符)： 由shell解释器来解析，如rm -rf *.pdf，元字符*Shell将其解析为任意多个字符
正则表达式元字符		 ： 由各种执行模式匹配操作的程序来解析，比如vi、grep、sed、awk

界定边界

[root@hass-11 ~]# cat a.txt 
root
roott
roo
[root@hass-11 ~]# grep "root" a.txt 
root
roott

#方法一
[root@hass-11 ~]# grep -w "root" a.txt 
root
#方法二
单词一般以空格或特殊字符做分隔,连续的字符串被当做单词，< 单词头，> 单词尾
[root@hass-11 ~]# grep "<root>" a.txt 
root
#方法三
[root@hass-11 ~]# grep "@root@" a.txt 	#无效，指定边界必须使用指定字符
[root@hass-11 ~]# grep "root" a.txt 
root
#方法四
[root@hass-11 ~]# cat a.txt 
root
roott
roo
[root@hass-11 ~]# grep '(root)' a.txt 
root
roott

分组

分组是为了简化对“过滤内容的书写”

[root@hass-11 ~]# cat c.txt 
abababababab1ab
abababababab2cd
abababababab3ef
[root@hass-11 ~]# grep '(ab)*1ab' c.txt 
abababababab1ab
[root@hass-11 ~]# grep '(ab)*11' c.txt 
abababababab1ab

点 .

参考文档

点. 表示任意一个字符，换行符除外

[root@hass-11 ~]# echo -e "11
111
1111
11111" > a.txt
[root@hass-11 ~]# grep 1.1 a.txt
111
1111
11111
[root@hass-11 ~]# grep 1..1 a.txt
1111
11111

#使用-w 参数限制grep的贪婪匹配
[root@hass-11 ~]# grep -w 1.1 a.txt 
111
[root@hass-11 ~]# grep -w 1..1 a.txt 
1111
[root@hass-11 ~]# grep -w 1...1 a.txt 
11111

【】

. 的扩展

[]			:匹配任意一个字符，可以指定字符属性
[a-z]		:匹配任意一个小写字母
[A-Z]		:匹配任意一个大写字母
[a-Z]		:匹配任意一个字母
[a-zA-Z]	:匹配任意一个字母
[0-9]		:匹配任意一个数字
[0-9][0-9]	：匹配两个数字
[0-9a-zA-Z]	：匹配任意一个数字或字母
[-+*/]		:匹配 +-*/ 中的任意一个，-号不能放中间

[ ]			:匹配空格	
$'[	]*'	：匹配TAB制表符
^[]			:以 开头
[]$			:以 结尾
a[^0-9]b	:取反，但还是表示一个字符，也可以说一组字符
a^[^0-9]*b	:f非字符组的任意字符

#批量创建文件
[root@hass-11 num]# touch {a..z}
[root@hass-11 num]# touch {A..Z}

星 *

* 表示左侧的那 '一个' 字符连续出现 0-任意次，其他字符原样匹配
	也就是说*左侧那个字符有没有都匹配
	.* 匹配任意字符，一个字符出现 0-任意次
	* 是贪婪的，尽可能的吃更多的字符，匹配出字符串，过滤出指定内容
	.*? 非贪婪匹配

[root@hass-11 ~]# echo -e "ab
abb
aab
aabbb" > a.txt
[root@hass-11 ~]# grep a*b a.txt	# a出现 0-任意次
ab
abb
aab
aabbb
[root@hass-11 ~]# grep a*bb a.txt
abb
aabbb
[root@hass-11 ~]# grep a*bbb a.txt
aabbb
[root@hass-11 ~]# grep aa*b a.txt
ab
abb
aab
aabbb
[root@hass-11 ~]# grep aaa*b a.txt
aab
aabbb
[root@hass-11 ~]# grep aaaa*b a.txt

#grep编辑器默认是贪婪匹配
#默认情况下，grep不支持非贪婪修饰符，但您可以使用grep -P来使用Perl语法来支持.*?
[root@hass-11 ~]# cat a.txt 
<a href="http://www.baidu.com">"百度"</a>
<a href="http://www.sina.com.cn">"新浪"</a>
[root@hass-11 ~]# grep 'href=".*"' a.txt 
<a href="http://www.baidu.com">"百度"</a>
<a href="http://www.sina.com.cn">"新浪"</a>
[root@hass-11 ~]# grep -P 'href=".*?"' a.txt 
<a href="http://www.baidu.com">"百度"</a>
<a href="http://www.sina.com.cn">"新浪"</a>
[root@hass-11 ~]# grep -Po 'href=".*?"' a.txt 
href="http://www.baidu.com"
href="http://www.sina.com.cn"
[root@hass-11 ~]# grep -Po 'href=".*?"' a.txt |awk -F'["]' '{print $2}'
http://www.baidu.com
http://www.sina.com.cn

扩展正则

* 的扩展

a{m,n}b		:a 出现m-n次，b出现一次
a{,n}b		:a 出现0-n次，b出现一次
a{m,}b		:a 出现m-无数次，b出现一次
a{0,}b		:a 出现0-无数次，b出现一次，此时{0,} = *

ab{m,n}		:a 出现一次，b出现m-n次
ab{m,}		:a 出现一次，b出现m-无数次
ab{,n}		:a 出现一次，b出现0-n次
ab{0,}		:a 出现一次，b出现0-无数次，此时{0,} = *

ab+			:+ 左侧那个字符出现1或无穷次，{1,}
ab?			:? 左侧那个字符出现0到1次，{0,1}

[root@hass-11 ~]# echo -e "233
ab
abb
aab
aabbb" > a.txt

[root@hass-11 ~]# grep -E 'a?b' a.txt 	#a出现 0-1次，y出现一次
ab
abb
aab
aabbb
[root@hass-11 ~]# grep -E 'a+b' a.txt 	#a出现 1-任意次，y出现一次
ab
abb
aab
aabbb

[root@hass-11 ~]# grep -E 'aab' a.txt 	#grep贪婪匹配
aab
aabbb
[root@hass-11 ~]# grep -E 'a{2}b' a.txt 	#a出现 2次，b出现一次
aab
aabbb

[root@hass-11 ~]# grep -E 'a{2,3}b' a.txt 	#a出现 2-3次，b出现一次,此时a是贪婪的
aab
aabbb

[root@hass-11 ~]# grep -E 'a{2,}b' a.txt 	#a出现 2-任意次，b出现一次
aab
aabbb

[root@hass-11 ~]# grep -E 'a{,2}b' a.txt  	#a出现 0-2次，b出现一次
ab
abb
aab
aabbb

[root@hass-11 ~]# grep -E 'ab{2,}' a.txt 	#a或b出现 2-任意次
abb
aabbb

[root@hass-11 ~]# grep -E 'a*b' a.txt 	#关键字之间匹配任意多个字符，a出现 0-任意次，b出现一次
ab
abb
aab
aabbb

|

grep中 | 表示或者的意思，不是管道

[root@hass-11 ~]# cat b.txt 
company
companies
com
comp
111
[root@hass-11 ~]# egrep 'company|companies' b.txt 
company
companies
[root@hass-11 ~]# egrep 'compan(y|ies)' b.txt 
company
companies

换行符和制表符

[root@hass-11 ~]# cat d.txt 
aaaaaaa
    #4个空格
	#一个TAB键
bbbbbbb
##############

[root@hass-11 ~]# egrep -v $'
' d.txt #因为每行结尾都有一个换行符，所以这里排除所有行
[root@hass-11 ~]# egrep -v $'	' d.txt 	#排除含有制表符的行
aaaaaaa
     
bbbbbbb
##############

关于配置文件的操作

[root@hass-11 ~]# cat d.txt 
aaaaaaa
     #4个空格
	#1个tab
bbbbbbb
##############

#排除以 #号开头或者空行（包括只有空格的行）
[root@hass-11 ~]# egrep -v '^#|^$' d.txt 
aaaaaaa
	
bbbbbbb

#排除以 #号开头 或者空行 或者以空格开头 或者以tab制表符开头的行
[root@hass-11 ~]# egrep -v $'^# | ^$ | ^[ ] | ^[	]' d.txt 
aaaaaaa
bbbbbbb
[root@hass-11 ~]# egrep -v $'^[# 	]' d.txt 
aaaaaaa
bbbbbbb

删除空行

空行包括由空行空格 TAB键组成的行

1.tr命令,只能排除空行
cat 文件名 |tr -s '
'

2.grep命令,只能排除空行
grep -v '^$' 文件名

3.sed命令,只能排除空行
cat 文件名 |sed '/^$/d' 

4.AWK命令,只能排除空行
cat 文件名 |awk '{if($0!="")print}'
cat 文件名 |awk '{if(length !=0) print $0}'

删除空格 TAB键组成的行

1.
[root@hass-11 num]# egrep -v '^#|[[:space:]]' /etc/passwd

练习

1.过滤出用户名组成是字母+数字+字母的行
egrep '^[a-Z]+[0-9]+[a-Z]+' /etc/passwd
2.过滤掉/etc/ssh/sshd_config内所有注释和所有空行
grep -v '^#' /etc/ssh/sshd_config |grep -v '^ *$' |grep -v $'^[	]*$' |grep -v '^$'

posix

作用：把复杂的正则表达式，用简单的字符串表达出来的规则

# 表达式       	  功能                              			示例
[:alnum:]     	字母与数字字符                       		[[:alnum:]]+  
[:alpha:]  		字母字符(包括大小写字母)					[[:alpha:]]{4}
[:blank:]     	空格与制表符                         		  [[:blank:]]*
[:digit:]       数字                            		     [[:digit:]]?
[:lower:]      	小写字母                            		[[:lower:]]{5,}
[:upper:]      	大写字母                            		[[:upper:]]+
[:punct:]      	标点符号                            		[[:punct:]]
[:space:]      	包括换行符，回车等在内的所有空白,[[:space:]]+
[:graph:]		匹配所有看得见的字符
[:print:]		匹配所有看得见的字符，能打印到纸上的所有符号
[:cntrl:]		匹配控制键

# 详解
[:alnum:] Alphanumeric characters.
匹配范围为 [a-zA-Z0-9]

[:alpha:] Alphabetic characters.
匹配范围为 [a-zA-Z]

[:blank:] Space or tab characters.
匹配范围为 空格和TAB键

[:cntrl:] Control characters.
匹配控制键 例如 ^M 要按 ctrl+v 再按回车 才能输出

[:digit:] Numeric characters.
匹配所有数字 [0-9]

[:graph:] Characters that are both printable and visible. (A space is print-
able, but not visible, while an a is both.)
匹配所有可见字符 但不包含空格和TAB 就是你在文本文档中按键盘上能用眼睛观察到的所有符号

[:lower:] Lower-case alphabetic characters.
小写 [a-z]

[:print:] Printable characters (characters that are not control characters.)
匹配所有可见字符 包括空格和TAB，能打印到纸上的所有符号

[:punct:] Punctuation characters (characters that are not letter, digits, con-
trol characters, or space characters).
特殊输入符号 +-=)(*&^%$#@!~`|"'{}[]:;?/>.<,
注意它不包含空格和TAB
这个集合不等于^[a-zA-Z0-9]

[:space:] Space characters (such as space, tab, and formfeed, to name a few).

[:upper:] Upper-case alphabetic characters.
大写 [A-Z]
[:xdigit:] Characters that are hexadecimal digits.
16进制数 [0-f]

# 使用方法:
[root@egon ~]# grep '[[:alnum:]]' /etc/passwd