文本处理grep命令

 1 this is a words file.
 2 words words to be 
 3 1 2, 3  4 , 5 , 5 6 , 6 , 7 , 7 , 8 , 8 9 , 9 , 10
 4 beginning linux programming 4th edition
 5 1000 222222 334 5 99999
 6 
 7 this is a line containing pattern
 8 ,.<>?;';;;' [] {= =   | -__!@#$%^&*() !@#$$%%^&*(()*&%@(#$%))
 9 
10 www.regexper.com
11 www.google.com
12 www.baidu.com
13 www.redhat.com

我们的测试文件名字叫 n，如上所示，共13行。

grep按行检索，按行输出。

1，搜索特定模式的行

1 [lizhen@dhcp-128-93 shell]$ grep words n
2 this is a words file.
3 words words to be 
4 [lizhen@dhcp-128-93 shell]$

2，单个grep命令可以对多个文件进行检索

[lizhen@dhcp-128-93 shell]$ grep words n n1 n2
n:this is a words file.
n:words words to be 
n1:this is a words file.
n1:words words to be 
n2:this is a words file.
n2:words words to be 
[lizhen@dhcp-128-93 shell]$

3，使用正则表达式，添加-E选项，或者直接egrep (在terminal下可以看到这些被匹配的部分是被红色特殊显示的，这里显示的是被匹配到的行)

[lizhen@dhcp-128-93 shell]$ egrep "[a-o]+" n
this is a words file.
words words to be 
beginning linux programming 4th edition
this is a line containing pattern
www.regexper.com
www.google.com
www.baidu.com
www.redhat.com
[lizhen@dhcp-128-93 shell]$

4，只输出文件中匹配到的文本部分呢，使用-o

[lizhen@dhcp-128-93 shell]$ grep words n
this is a words file.
words words to be 
[lizhen@dhcp-128-93 shell]$ grep words n -o
words
words
words
[lizhen@dhcp-128-93 shell]$

5，打印除包含match_pattern行之外的所有行，使用-v

[lizhen@dhcp-128-93 shell]$ grep words n -v
1 2, 3  4 , 5 , 5 6 , 6 , 7 , 7 , 8 , 8 9 , 9 , 10
beginning linux programming 4th edition
1000 222222 334 5 99999

this is a line containing pattern
,.<>?;';;;' [] {= =   | -__!@#$%^&*() !@#$$%%^&*(()*&%@(#$%))

www.regexper.com
www.google.com
www.baidu.com
www.redhat.com
[lizhen@dhcp-128-93 shell]$

6，统计文件包含匹配字符串的行数，使用-c （-c只统计匹配到的行数，并不会统计匹配到的次数）

[lizhen@dhcp-128-93 shell]$ grep words n -c
2
[lizhen@dhcp-128-93 shell]$ grep words n 
this is a words file.
words words to be 
[lizhen@dhcp-128-93 shell]$

7，统计匹配到的字符串的数量，使用-o | wc -l

[lizhen@dhcp-128-93 shell]$ grep -o words n | wc -l
3
[lizhen@dhcp-128-93 shell]$ grep words n
this is a words file.
words words to be 
[lizhen@dhcp-128-93 shell]$

8，打印出包含匹配字符串的行号，使用-n

[lizhen@dhcp-128-93 shell]$ grep w -n n n1
n:1:this is a words file.
n:2:words words to be 
n:10:www.regexper.com
n:11:www.google.com
n:12:www.baidu.com
n:13:www.redhat.com
n1:1:this is a words file.
n1:2:words words to be 
n1:10:www.regexper.com
n1:11:www.google.com
n1:12:www.baidu.com
n1:13:www.redhat.com
[lizhen@dhcp-128-93 shell]$

9 打印模式匹配所位于的字符或字节偏移，使用-b -o

[lizhen@dhcp-128-93 shell]$ grep words n
this is a words file.
words words to be 
[lizhen@dhcp-128-93 shell]$ grep words -b -o n
10:words
22:words
28:words
[lizhen@dhcp-128-93 shell]$

10，搜索多个文件并找出文本位于哪一个文件中，使用-l

[lizhen@dhcp-128-93 shell]$ grep words n n1
n:this is a words file.
n:words words to be 
n1:this is a words file.
n1:words words to be 
[lizhen@dhcp-128-93 shell]$ grep words -l n n1
n
n1
[lizhen@dhcp-128-93 shell]$

使用-L 大写的L字符，取相反的结果

[lizhen@dhcp-128-93 shell]$ grep words n n1
n:this is a words file.
n:words words to be 
n1:this is a words file.
n1:words words to be 
[lizhen@dhcp-128-93 shell]$ grep words -l n n1
n
n1
[lizhen@dhcp-128-93 shell]$ grep words -L n n1
[lizhen@dhcp-128-93 shell]$

11，递归搜索文件，使用-R -n （-n选项表示显示所在文件名：行号）

[lizhen@dhcp-128-93 shell]$ grep words . -R -n
./n:1:this is a words file.
./n:2:words words to be 
./n1:1:this is a words file.
./n1:2:words words to be 
./n2:1:this is a words file.
./n2:2:words words to be 
[lizhen@dhcp-128-93 shell]$

12，忽略样式中的大小写，使用-i

[lizhen@dhcp-128-93 shell]$ grep WORDS -i n
this is a words file.
words words to be 
[lizhen@dhcp-128-93 shell]$

13，使用grep匹配多个样式，使用-e

[lizhen@dhcp-128-93 shell]$ grep -e words  -e www -o n
words
words
words
www
www
www
www
[lizhen@dhcp-128-93 shell]$

14，使用样式文件，利用grep逐行读取样式文件，grep会将匹配到的行输出

[lizhen@dhcp-128-93 shell]$ grep -f f n
this is a words file.
words words to be 
1 2, 3  4 , 5 , 5 6 , 6 , 7 , 7 , 8 , 8 9 , 9 , 10
1000 222222 334 5 99999
www.regexper.com
www.google.com
www.baidu.com
www.redhat.com
[lizhen@dhcp-128-93 shell]$

15，在grep搜索中指定或排除文件

# grep "main()" . -r --include *.{c,cpp}

#grep "main()" . -r --exclude "readme"

16,grep 的静默输出，使用-q

#########################################################################
# File Name: begin.sh
# Author: lizhen
# mail: lizhen_ok@163.com
# Created Time: Wed 18 May 2016 08:29:32 PM CST
#########################################################################
#!/bin/bash
if [ $# -ne 2  ]
then
    echo "usage: $0 match_text filename"
    exit 1
fi

match_text=$1
filename=$2
grep -q "$match_text" $filename

if [ $? -eq 0 ]
then
    echo "The text exists in the file"
else
    echo "text does not exist in the file"
fi

echo "done!"

[lizhen@dhcp-128-93 shell]$ ./begin.sh words n
The text exists in the file
done!
[lizhen@dhcp-128-93 shell]$

17，打印匹配行之前或之后的行，使用-B，-A，-C选项

[lizhen@dhcp-128-93 shell]$ grep www -B 3 n
this is a line containing pattern
,.<>?;';;;' [] {= =   | -__!@#$%^&*() !@#$$%%^&*(()*&%@(#$%))

www.regexper.com
www.google.com
www.baidu.com
www.redhat.com
[lizhen@dhcp-128-93 shell]$ grep www -B 1 n

www.regexper.com
www.google.com
www.baidu.com
www.redhat.com
[lizhen@dhcp-128-93 shell]$ grep www n
www.regexper.com
www.google.com
www.baidu.com
www.redhat.com
[lizhen@dhcp-128-93 shell]$

[lizhen@dhcp-128-93 shell]$ grep words -A  1 n
this is a words file.
words words to be 
1 2, 3  4 , 5 , 5 6 , 6 , 7 , 7 , 8 , 8 9 , 9 , 10
[lizhen@dhcp-128-93 shell]$

[lizhen@dhcp-128-93 shell]$ grep 2222 n
1000 222222 334 5 99999
[lizhen@dhcp-128-93 shell]$ grep 2222 n -C 1
beginning linux programming 4th edition
1000 222222 334 5 99999

[lizhen@dhcp-128-93 shell]$