第12章 正则表达式与文件格式化处理

基础正则表达式

语系对正则表达式的影响

不同语系下,字符的编码数据可能不同。

LANG=C:012……ABC……abc……

LANG=zh_CN:012……aAbB……

因此,使用[A-Z]时,搜索到的字符也不一样。

特殊符号 代表意义
[:alnum:] 大小写字符及数字,0-9,A-Z,a-z
[:alpha:] 英文大小写字符
[:blank:] 空格键与tab键
[:cntrl:] 控制按键,CR,LF,TAB,DEL等
[:digit:] 代表数字
[:graph:] 除空格符(空格和Tab)外其他按键
[:lower:] 小写字符
[:print:] 可以被打印出来的字符
[:punct:] 标点字符," ' ? ; : # $
[:upper:] 大写字符
[:space:] 任何会产生空白的字符
[:xdigit:] 十六进制数字

grep的一些高级参数

除了上一章介绍的基本用法,grep还有一些高级用法。

grep [-A] [-B] [--color=auto} '搜寻字符串‘ filename

参数:

-A:后面可加数字n,为after的意思,除了列出该列,后面的n列也列出来

-B:后面可加数字n,为after的意思,除了列出该列,前面的n列也列出来

--color=auto:对正确选取的数据着色

//-n用于显示行号
[root@localhost 桌面]# dmesg | grep -n --color=auto 'eth'
1730:[   10.210383] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:7f:dd:91
1731:[   10.210404] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection

注:grep搜索到字符串后都是以整行为单位显示。

 

基础正则表达式练习

以下是练习文本

[root@localhost 桌面]# cat regular_express.txt
"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
However, this dress is about $ 3183 dollars.
GNU is free air not free beer.
Her hair is very beauty.
I can't finish the test.
Oh! The soup taste good.
motorcycle is cheap than car.
This window is clear.
the symbol '*' is represented as start.
Oh!    My god!
The gd software is a library for drafting programs.
You are the best is mean you are the no. 1.
The world <Happy> is the same with "glad".
I like dog.
google is the best tools for search keyword.
goooooogle yes!
go! go! Let's go.
# I am VBird

[root@localhost 桌面]# 

例题一:查找特定字符串

//查找含有the的行
[root@localhost 桌面]# grep -n 'the' regular_express.txt
8:I can't finish the test.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

//查找不含有the的行
[root@localhost 桌面]# grep -vn 'the' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
5:However, this dress is about $ 3183 dollars.
6:GNU is free air not free beer.
7:Her hair is very beauty.
9:Oh! The soup taste good.
10:motorcycle is cheap than car.
11:This window is clear.
13:Oh!    My god!
14:The gd software is a library for drafting programs.
17:I like dog.
19:goooooogle yes!
20:go! go! Let's go.
21:# I am VBird
22:
[root@localhost 桌面]# 

例题二:利用中括号[]来查找集合字符

//查找tast或test字符串
[root@localhost 桌面]# grep -n 't[ae]st' regular_express.txt
8:I can't finish the test.
9:Oh! The soup taste good.

//查找不是以g开头的oo字符串
[root@localhost 桌面]# grep -n '[^g]oo' regular_express.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

//查找数字
[root@localhost 桌面]# grep -n '[0-9]' regular_express.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.

查找不是以小写字母开头的oo字符串
[root@localhost 桌面]# grep -n '[^[:lower:]]oo' regular_express.txt
3:Football game is not use feet only.
[root@localhost 桌面]# 

例题三:行首与行尾字符^$

//以the开头的行
[root@localhost 桌面]# grep -n '^the' regular_express.txt
12:the symbol '*' is represented as start.

//以小写字母开头的行
[root@localhost 桌面]# grep -n '^[a-z]' regular_express.txt
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

//以小数点结尾的(需要转义)
[root@localhost 桌面]# grep -n '.$' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.

//查找空白行
[root@localhost 桌面]# grep -n '^$' regular_express.txt
22:
[root@localhost 桌面]# 

例题四:任意字符.和重复字符*

.(小数点):代表一定有一个任意字符的意思

*:代表重复前一个0到无穷的意思

//查找以g开头,d结尾,中间两个字符的字符
[root@localhost 桌面]# grep -n 'g..d' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.
16:The world <Happy> is the same with "glad".

//查找至少含有两个o,后面跟0到无穷个o的字符
[root@localhost 桌面]# grep -n 'ooo*' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!
[root@localhost 桌面]# 

例题五:限定连续RE字符范围{}

{}必须转义

//查找o重复两次的字符
[root@localhost 桌面]# grep -n 'o{2}' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

//查找o重复2到5次的字符
[root@localhost 桌面]# grep -n 'o{2,5}' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

//查找o重复两次以上的
[root@localhost 桌面]# grep -n 'go{2,}g' regular_express.txt
18:google is the best tools for search keyword.
19:goooooogle yes!
[root@localhost 桌面]# 

 基础正则表达式字符

经过上节的五个例题,可将基础的正则表达式总结如下:

RE字符 意义
^word 带查找的字符串在行首
word$ 待查找的字符串在行尾
. 代表一定有一个任意字符的字符
转义字符
* 重复零到无穷多个前一个字符
[list] 从字符集合的RE字符里找到想要选取的字符
[n1-n2] 从字符集合的RE字符里找到想要选取的字符范围
[^list]
从字符集合的RE字符里找到不想要选取的字符范围
{n,m} 前一个字符重复n到m次

 

sed工具

 sed本身也是管道命令,不仅可以分析标准输出数据,还可以将数据进行替换、删除、新增和选取特定行等功能。

sed [-nefr] [动作]

参数:

-n:安静模式。默认情况下,所有来自STDIN的数据都会列在屏幕上,加上-n后,只有经过sed指令特殊处理的那一行才会显示出来

-e:直接在命令行模式上进行sed动作编辑

-f:直接将sed的动作写在一个文件内,-f filename则可以执行filename内的sed动作

-r:sed动作支持扩展性正则表达式(默认是基础型正则表达式)

-i:直接修改读取的文件内容,而不是屏幕输出

动作说明:[n1[,n2]] function

n1,n2不一定存在,一般代表选择动作的行数。

function有以下参数:

a:新增,a后面可以接字符串,而这些字符串会在新的一行出现(目前的下一行)

c:替换,c的后面可以接字符串,可以替换n1-n2行之间的行

d:删除

i:插入,后面可以接字符串,而这些字符串会在新的一行出现(目前的上一行)

p:打印

s:替换,通常搭配正则表达式

//原始文本
[root@localhost 桌面]# cat -n test.txt
     1    this a test text!
     2    i like linux !
     3    today is monday!
     4    my name is fw.
     5    

//删除2-3行
[root@localhost 桌面]# cat -n test.txt | sed '2,3d'
     1    this a test text!
     4    my name is fw.
     5    

//删除第3行及后面的
[root@localhost 桌面]# cat -n test.txt | sed '3,$d'
     1    this a test text!
     2    i like linux !

//新增(在后面)
[root@localhost 桌面]# cat -n test.txt | sed '2a this line is new'
     1    this a test text!
     2    i like linux !
this line is new
     3    today is monday!
     4    my name is fw.
     5    

////插入(在前面)
[root@localhost 桌面]# cat -n test.txt | sed '2i this line is new'
     1    this a test text!
this line is new
     2    i like linux !
     3    today is monday!
     4    my name is fw.
     5    

//替换
[root@localhost 桌面]# cat -n test.txt | sed '2c this line is new'
     1    this a test text!
this line is new
     3    today is monday!
     4    my name is fw.
     5    

//显示2-4行
[root@localhost 桌面]# cat -n test.txt | sed -n '2,4p'
     2    i like linux !
     3    today is monday!
     4    my name is fw.

查找并替换:sed ‘s/要替换的字符串/新的字符串/g’ 

查找字符串可以使用正则表达式

//查看原文本
[root@localhost 桌面]# cat -n test.txt
     1    this a test text!
     2    i like linux !
     3    today is monday!
     4    my name is fw.
     5    

//将this替换成that
[root@localhost 桌面]# cat -n test.txt | sed 's/this/that/g'
     1    that a test text!
     2    i like linux !
     3    today is monday!
     4    my name is fw.
     5    

//将结尾的!替换成小数点.
[root@localhost 桌面]# cat -n test.txt | sed 's/!$/./g'
     1    this a test text.
     2    i like linux .
     3    today is monday.
     4    my name is fw.
     5    

//将开头的this删除
[root@localhost 桌面]# cat -n test.txt | sed 's/^.*this//g'
 a test text!
     2    i like linux !
     3    today is monday!
     4    my name is fw.
     5    
[root@localhost 桌面]# 

直接修改文件内容:

-i参数

//查看原文件
[root@localhost 桌面]# cat test.txt
this a test text!
i like linux !
today is monday!
my name is fw.

//将this替换成that,写入原文件
[root@localhost 桌面]# sed -i 's/this/that/g' test.txt

//查看原文件
[root@localhost 桌面]# cat test.txt
that a test text!
i like linux !
today is monday!
my name is fw.

扩展正则表达式

该部分暂时略过。

 

文件的格式化与相关处理

格式化打印:printf

  printf '打印格式' 实际内容

参数:

几个格式方面的特殊样式:

a:警告声音输出

:退格键

f:清除屏幕

:输出新的一行

:Enter按键

:水平Tab按键

v:垂直Tab按键

xNN:NN为两位数的数字,可以转换数字为字符

c程序语言内常见变量格式:

%ns:n是数字,s代表string,即多少个字符

%ni:n是数字,i代表integer,即多少个整数字数

%N.nf:n和N都是数字,f代表float

//查看原文本
[root@localhost 桌面]# cat test.txt
Name    Chinese    English    Math    Average
Tom    80    60    92    77.33
Sherry    75    55    80    70.00
John    60    90    70    73.33


[root@localhost 桌面]# printf '%s	 %s	 %s	 %s	 %s	 
' $(cat test.txt)
Name     Chinese     English     Math     Average     
Tom     80     60     92     77.33     
Sherry     75     55     80     70.00     
John     60     90     70     73.33     

[root@localhost 桌面]# printf '%10s %5i %5i %5i %8.3f 
' $(cat test.txt)
bash: printf: Chinese: 无效数字
bash: printf: English: 无效数字
bash: printf: Math: 无效数字
bash: printf: Average: 无效数字
      Name     0     0     0    0.000 
       Tom    80    60    92   77.330 
    Sherry    75    55    80   70.000 
      John    60    90    70   73.330 

//输出编码值为45的字符
[root@localhost 桌面]# printf 'x45
'
E
[root@localhost 桌面]# 

awk:好用的数据处理工具

awk ‘条件类型1{动作1} 条件类型2{动作2}……’ filename

[root@localhost 桌面]# last -n 5
root     pts/0        :0               Mon Jul 18 14:19   still logged in   
root     :0           :0               Mon Jul 18 14:10   still logged in   
(unknown :0           :0               Mon Jul 18 14:08 - 14:10  (00:01)    
reboot   system boot  3.10.0-327.el7.x Mon Jul 18 14:08 - 16:00  (01:52)    
root     pts/0        :0               Sun Jul 17 15:44 - crash  (22:23)    

wtmp begins Mon Apr 25 13:36:45 2016

[root@localhost 桌面]# last -n 5 | awk '{print $1 "	" $4}'
root    Mon
root    Mon
(unknown    Mon
reboot    3.10.0-327.el7.x
root    Sun
    
wtmp    Apr
[root@localhost 桌面]# 

awk指令会把每一行根据空格或者tab分割,然后将所有片段依次赋值给$1,$2,……变量。

awk内置变量

NF:每行字段总数

NR:目前awk所处理的是第几行数据

FS:目前的分割字符,默认是空格

[root@localhost 桌面]# last -n 5 | awk '{print $1 "	 lines:" NR "	 cplumes:" NF}'
root     lines:1     cplumes:10
root     lines:2     cplumes:10
(unknown     lines:3     cplumes:10
reboot     lines:4     cplumes:11
root     lines:5     cplumes:10
     lines:6     cplumes:0
wtmp     lines:7     cplumes:7

awk的逻辑运算符

>:大于

<:小于

>=:大于等于

<=:小于等于

==:等于

!=:不等于

[root@localhost 桌面]# cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin

//以下一“:”作为分隔符,但第一行会失效
[root@localhost 桌面]# cat /etc/passwd | 
> awk '{FS=":"} $3<10 {print $1 "	" $3}'
root:x:0:0:root:/root:/bin/bash    
bin    1
daemon    2
adm    3
lp    4
sync    5
shutdown    6
halt    7
mail    8

//以下利用BEGIN预先设置变量,第一行便不会失效
[root@localhost 桌面]# cat /etc/passwd | 
> awk 'BEGIN {FS=":"} $3<10 {print $1 "	" $3}'
root    0
bin    1
daemon    2
adm    3
lp    4
sync    5
shutdown    6
halt    7
mail    8
[root@localhost 桌面]# 

awk的计算功能

//查看原文本
[root@localhost 桌面]# cat pay.txt
Name    1st    2nd    3th
Tom    2300    3200    1200
Sherry    3400    1200    7400
 
//在awk中变量可以直接使用,不需要$,awk的{}动作内若有多个命令辅助时,使用“;”分隔
[root@localhost 桌面]# cat pay.txt | 
> awk 'NR==1{printf "%10s %10s %10s %10s %10s 
",$1,$2,$3,$4,"Total"}
> NR>=2{total=$2+$3+$4;printf "%10s %10d %10d %10d %10.2f 
",$1,$2,$3,$4,total}'
      Name        1st        2nd        3th      Total 
       Tom       2300       3200       1200    6700.00 
    Sherry       3400       1200       7400   12000.00 

文件比较工具

diff

用于相似文件的比较。

diff [-bBi]  fileA fileB

参数:

-b:忽略一行中多个空格的区别

-B:忽略空白行的区别

-i:忽略大小写区别

[root@localhost 桌面]# vim fileA
[root@localhost 桌面]# cp fileA fileB
[root@localhost 桌面]# vim fileB
[root@localhost 桌面]# cat fileA
this is fileA


[root@localhost 桌面]# cat fileB
this is fileB

ok
[root@localhost 桌面]# diff fileA fileB
1,2c1
< this is fileA
< 
---
> this is fileB
3a3
> ok
[root@localhost 桌面]# 

patch

该命令与diff密不可分,加入fileA和fileB是两个不同版本的文件,想用fileB来更新fileA,则先通过diff比较两个文件的区别,并将区别文件制作成补丁文件,再由补丁文件更新旧文件。

patch -pN < patchFile  《==更新

patch -R -pN < patchFile     《==还原

参数:

-p:后面N表示取消几层目录

-R:代表还原

[root@localhost 桌面]# cat fileA
this is fileA


[root@localhost 桌面]# cat fileB
this is fileB

ok

//制作补丁文件
[root@localhost 桌面]# diff -Naur fileA fileB > file.patch
[root@localhost 桌面]# cat file.patch
--- fileA    2016-07-18 16:36:24.371373349 +0800
+++ fileB    2016-07-18 16:37:31.523401652 +0800
@@ -1,3 +1,3 @@
-this is fileA
-
+this is fileB
 
+ok

//使用补丁文件更新旧文件,因为在当前目录,因此N为0
[root@localhost 桌面]# patch -p0 < file.patch
patching file fileA
[root@localhost 桌面]# cat fileA
this is fileB

ok
[root@localhost 桌面]# 

 

 

原文地址:https://www.cnblogs.com/wuchaodzxx/p/5678709.html