正则表达式grep学习(一)

文本处理三剑客
grep
文本过滤
sed
流过滤
awk
格式处理

正则表达式
就是一些特殊字符组成的模式，赋予了他特定的含义

在大多数程序里，正则表达式都被置于两个正斜杠之间；例如/l[o0]ve/就是由正斜杠界定的正则表达式，它将匹配被查找的行中任何位置
出现的相同模式。在正则表达式中，元字符是最重要的概念。

正则表达式的作用:

在企业工作中，我们每天做的linux运维工作中，时刻都会面对大量带有字符串的文本配置、程序、命令输出以及日志文件等,而我们经常会
有迫切的需要，从大量的字符串内容中查找符号工作需要特定的字符串。着就要靠正则表达式。因此，可以说正则表达式就是为了过滤这样
字符串的需求而生的！

1)inux正则表达式grep，sed，awk
2)大量的字符串文件需要进行配置，而且非交互式的
3)过滤相关的字符串，匹配字符串，打印字符串

正则表达式注意事项:
1)正则表达式应用非常广泛，存在于各种语言中，例如：php，java，python等
2)正则表达式和通配符特殊字符是有本质区别的
3)想要学号grep、sed、awk首先就要掌握正则表达式

通配符

1)*   
    任意字符
[root@xiaoming re]# ls /etc/*host*
/etc/host.conf  /etc/hostname  /etc/hosts  /etc/hosts.allow  /etc/hosts.deny
    
2)?
    任意单个字符
[root@xiaoming re]# ls /etc/host?
/etc/hosts
    
3)[]
    范围 [a-z] [A-Z] [0-9]
[root@xiaoming re]# ls
14asd  55ad  5Asad  ASASD  asdas  Dasd  sasdA  sddd    
[root@xiaoming re]# ls [a-z]*
ASASD  asdas  Dasd  sasdA  sddd
[root@xiaoming re]# ls [0-9][0-9]*
14asd  55ad
[root@xiaoming re]# ls [a-z][A-Z][a-z]*
ASASD  asdas  sddd
[root@xiaoming re]# ls [a-zA-Z0-9]*
14asd  55ad  5Asad  ASASD  asdas  Dasd  sasdA  sddd
    
4){}
    序列   
[root@xiaoming re]# touch {1..10}
[root@xiaoming re]# ls
1  10  2  3  4  5  6  7  8  9

[root@xiaoming re]# touch {01..10}
[root@xiaoming re]# ls
01  02  03  04  05  06  07  08  09  10

//批量增加用户脚本
[root@xiaoming re]# cat useradd.sh 
for i in {00..10}
do 
useradd oldboy$i
done

特殊字符：

""     可以引用特殊字符,比如$
''     所见即所得
``=$() 命令,优先执行 

#      注释行；命令提示符身份
$      变量；命令提示符身份

;      cmd1;cmd2 表示先执行cmd1，再执行cmd2
&&     cmd1 && cmd2 表示cmd1执行成功后，再执行cmd2
||     cmd1 || cmd2 表示cmd1执行失败后，再执行cmd2

!      取反
       history,调用之前的命令
       
重定向
<
>
>>
2>
2>>
&>/dev/null
    1>/dev/null 2>&1
    1>/dev/null 2>/dev/null

管道符
|

~       家目录
.       
        当前目录
        在文件名前面，表示隐藏
        chown改属组
..
        上一级目录
-
        文件类型，普通文件
        cd - 进入上次目录
        su - 切换用户
%
+
        chmod加属性

[root@xiaoming re]# name=xiaoming
[root@xiaoming re]# echo 'my name is $name'
my name is $name
[root@xiaoming re]# echo "my name is $name"
my name is xiaoming
[root@xiaoming re]# echo my name is $name
my name is xiaoming
[root@xiaoming re]# test="my name is $name"
[root@xiaoming re]# echo $name
xiaoming
[root@xiaoming re]# test1=echo my name is $name
-bash: my: 未找到命令

grep,默认以行为单位

命令选项
-i  忽略大小写 
[root@xiaoming re]# grep documentroot /etc/httpd/conf/httpd.conf 
[root@xiaoming re]# grep -i documentroot /etc/httpd/conf/httpd.conf 
# DocumentRoot: The directory out of which you will serve your
DocumentRoot "/var/www/html"
    # access content that does not live under the DocumentRoot.
--color=auto 高亮显示
    centos6默认没做别名，可以自己添加别名    
-o  详细过程，显示满足条件的字符
-v  反向过滤
-E  egrep
-q  静默输出
    不需要看命令输出，只提供命令执行状态来判断是否执行成功
-n  显示行号

正则表达式实战

正则表达式：
BRE:基础正则
grep
ERE:扩展正则
egrep
grep -E

基础正则表达式

字符匹配：
.           匹配任意单个字符：除开空行
[ ]         匹指定范围内的任意单个字符
[^ ]        匹配指定范围外的任意单个字符
[[:space:]] 空白字符       
[[:digit:]] 所有数字       [0-9]
[[:lower:]] 所有小写字母   [a-z]
[[:upper:]] 所有大写字母   [A-Z]
[[:alpha:]] 所有大小写字母 [a-zA-Z] [a-Z]
[[:alnum:]] 所有字母和数字 [a-zA-Z0-9]
[[:punct:]] 所有标点符号

练习：
模板：

[root@xiaoming re]# cat re.txt 
I am oldboy teacher！
I teach linux.
test

I like badminton ball ,billiard ball and chinese chess!
my blog is http://oldboy.blog.51cto.com
our site is http://www.oldboy.com
my qq num is 191868516.
not 191886888516.

案例1:验证''.'，匹配任意单个字符，可以使用-o来验证

[root@xiaoming re]# grep '.' re.txt 
I am oldboy teacher！
I teach linux.
test
I like badminton ball ,billiard ball and chinese chess!
my blog is http://oldboy.blog.51cto.com
our site is http://www.oldboy.com
my qq num is 191868516.
not 191886888516.

案例2：验证[]里面无论写多少字符,都是或关系，只匹匹配单个字符案例2：验证[]里面无论写多少字符,都是或关系，只匹匹配单个字符

//过滤出包含所有小写字母和数字连在一起的两个字符
grep '[a-z][0-9]' re.txt 
//过滤出包含所有小写字母或数字8的单个字符
grep '[a-z8]' re.txt

匹配次数:用在要指定其出现的次数的字符的后面,用于限制其前面字符出现的字数

*           匹配其前面的字符任意次：0,1,多次;
                例如:grep "x*y"
                        abxy、aby、xxxxy、yab                
.*          匹配任意长度的任意字符;贪婪匹配
?          匹配其前面的字符1次或0次,也就是说其前面的字符是可有可无的;
+          匹配其前面的字符1次或多次,也就是说其前面的字符至少出现1次;
{m}       匹配其前面的字符m次,m是可以为0的正整数;
{m,n}     匹配其前面的字符至少m次,至多n次,n<=m;
{0,n}     至多n次;
{m,}      至少m次;

案例1:贪婪匹配

[root@xiaoming re]# grep -o "8*" re.txt
8
8
88
888
[root@xiaoming re]# grep -o "8" re.txt
8
8
8
8
8
8
8
//过滤出所有字符数字连在一起的字符串
[root@xiaoming re]#  grep '[a-z0-9]*' re.txt

案例2：过滤身份证号
分析：身份证号18位，前17位必须是数字，最后一位是数字或X

grep '^[0-9]{17}[0-9X]$' 1.txt

模板：
12121212125151515X
11111111111111111111
22222222222222222A
111111111111111111111111asd
11111111111111111a

案例3:精准匹配

[root@xiaoming re]# grep "8?" re.txt
I am oldboy teacher！
I teach linux.
test

I like badminton ball ,billiard ball and chinese chess!
my blog is http://oldboy.blog.51cto.com
our site is http://www.oldboy.com
my qq num is 191868516.
not 191886888516.
[root@xiaoming re]# grep -o "8?" re.txt
8
8
8
8
8
8
8

案例4:

[root@xiaoming re]# grep -o "8+" re.txt
8
8
88
888
[root@xiaoming re]# grep "8+" re.txt
my qq num is 191868516.
not 191886888516.

案例5:

[root@xiaoming re]# grep  "8{1}" re.txt
my qq num is 191868516.
not 191886888516.
[root@xiaoming re]# grep -o "8{1}" re.txt
8
8
8
8
8
8
8
[root@xiaoming re]# grep -o "8{2}" re.txt
88
88
[root@xiaoming re]# grep -o "8{3}" re.txt
888

位置锚定:

^           匹配行首,awk中,^是匹配模式的最左侧
$           匹配行尾,awk中,$是匹配模式的最右侧
^PATTERN$   用PATTERNL来匹配整行
    ^$      空白行
    ^[[:space:]]*$:空行
    
     单词:非特殊字符组成的连续字符串都称为单词
     <或:词首锚定,用于单词的左侧
     >或:词尾锚定,用于单词的右侧
     <PATTERN>:匹配完整单词

案例1:过滤以小写字母开头的行
grep "^[a-z]" re.txt
grep "^[[:lower:]]" re.txt

案例2:过滤出不是以小写字母开头的行
grep "^[^a-z]" re.txt   #不包含空行,推荐
grep -v "^[a-z]" re.txt
注意:如果^和$在[]里面, 没有任何含义

案例3:过滤以大写字母开头的行
grep "^[A-Z]" re.txt
grep "^[[:uper:]]" re.txt

案例4:过滤出以m结尾的行
grep "m$" re.txt

查看文件内特殊字符：
[root@xiaoming re]# cat A re.txt 

案例5:过滤出以.结尾的行
grep ".$" re.txt
grep '[.]$' re.txt 
注意: .代表任意单个字符,所以要转义

案例6:过滤出不以.结尾的行
grep '[^.]$' re.txt 
grep -v '[.]$' re.txt 

案例7:过滤出空行，并列出行号
grep -n "^$" re.txt
上面的答案只能过滤空行，不能满足包含空格的行,下面才是正解
[root@xiaoming re]# grep -n "^[[:space:]]*$" re.txt
4:

案例8::过滤出非空行（空喊意味着什么都没有）
grep -v "^$" re.txt
     
案例9:
oldboy:x:1013:1013::/home/oldboy:/bin/bash
[root@xiaoming re]# grep "oldboy>" /etc/passwd
[root@xiaoming re]# grep "oldboy" /etc/passwd
oldboy:x:1013:1013::/home/oldboy:/bin/bash

扩展正则
相比标准正则的区别
grep -E
egrep

字符匹配：
|           交替匹配|两边的任意一项ab(c|d)匹配abc或abd
  
匹配次数:用在要指定其出现的次数的字符的后面,用于限制其前面字符出现的字数
*           匹配其前面的字符任意次：0,1,多次;
                例如:grep "x*y"
                        abxy、aby、xxxxy、yab                
.*          匹配任意长度的任意字符;贪婪匹配
?           匹配其前面的字符1次或0次,也就是说其前面的字符是可有可无的;
+           匹配其前面的字符1次或多次,也就是说其前面的字符至少出现1次;
()          匹配表达式,创建一个用于匹配的子串;
{m}         匹配其前面的字符m次,m是可以为0的正整数;
{m,n}       匹配其前面的字符至少m次,至多n次,n<=m;
{0,n}       至多n次;
{m,}        至少m次;

课后练习
基本正则练习：

1、找出/etc/passwd文件中的两位数或三位数的行
grep '<[0-9]{2,3}>' /etc/passwd

2、找出/etc/grub2.cfg文档中，至少一个空白字符开头,且后面非空白字符的行
grep '^[[:space:]]+[^[:space:]]+' /etc/grub2.cfg 

3、找出"netstat -ant" 命令的结果以"LISTEN"后跟0、1或多个空白字符结尾的行
netstat -ant | grep 'LISTEN[[:space:]]*'
netstat -ant | grep -i 'listen[[:space:]]*'

扩展正则练习：

1、找出/etc/passwd文件中的两位数或三位数的行
egrep '<[0-9]{2,3}>' /etc/passwd
grep -E '<[0-9]{2,3}>' /etc/passwd

2、找出/etc/grub2.cfg文档中，至少一个空白字符开头,且后面非空白字符的行
egrep '^[[:space:]]+[^[:space:]]+' /etc/grub2.cfg 

3、找出/prop/meminfo文件中,所有大写或小写s开头的行：至少三种实现方式
egrep '^(s|S)' /proc/meminfo 
egrep '^[sS]' /proc/meminfo 
egrep -i '^s' /proc/meminfo

4、显示当前系统上root、CentOS或user1用户的相关信息
egrep '^(root|CentOS|user1)>' /etc/passwd

5、找出/etc/init.d/functions文件中某单词后跟一个小括号的行
egrep '[_a-Z]+()' /etc/init.d/functions 

6、找出ifconfig命令结果中的1-255之间的数值
0-9      [0-9]
10-99    [1-9][0-9]
100-199 1[1-9][0-9]
200-255 2[0-4][0-9]
250-255 25[0-5]
ifconfig | egrep '<([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])>'

7、找出ifconfig命令结果中的ip地址
ifconfig | grep -Eoe "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]).){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" -Eoe '^(e|b)[[:lower:]]+[[:digit:]]+?'