Linux 文本处理工具

Linux 文本处理工具
- grep
- awk
pattern
常用参数
内建变量/built-in variables
Regex
BEGIN/END
- cut
- sed
pattern
commad
特殊案例

grep

在当前目录中，查找后缀有 file 字样的文件中包含 test 字符串的文件，并打印出该字符串的行。此时，可以使用如下命令：

$ grep test test* #查找前缀有“test”的文件包含“test”字符串的文件  
testfile1:This a Linux testfile! #列出testfile1 文件中包含test字符的行  
testfile_2:This is a linux testfile! #列出testfile_2 文件中包含test字符的行  
testfile_2:Linux test #列出testfile_2 文件中包含test字符的行

awk

参考：

https://www.ruanyifeng.com/blog/2018/11/awk.html

gawk是gnu开源的，这里的awk就是gawk

文本处理工具，每次处理一行

pattern

awk '{[pattern]action}' {filname}, pattern采用regex，和正则不同的是如果当前处理的行包含regex就会返回当前行

和shell脚本中的$0,$1...不同，awk中$0表示当前行，$1表示以制表符或是换行符（CR或是FL）分隔的第一个字段，例如：this is a test，$0表示this is a test，$1表示 this。通过echo 'this is a test'|awk '{print $1}'来校验

常用参数

-F

指定分隔符，默认以空格为separator

echo 'this is a test'|awk -F: '{print $1}'输出this is a test，因为没有以:为分隔符

awk -F: '{print $1} /etc/passwd'

内建变量/built-in variables

NF

按照分隔符，获取字段数

echo 'this is a test'|awk '{print $NF}'所以这里就是打印出最后一个字段test，$(NF-1)表示倒数第二个字段

获取行数

awk -F: '{print NR")" $1}' /etc/passwd
...
41)pcp
42)sshd
43)avahi
44)postfix
45)oprofile
46)tcpdump
47)chz
...

Regex

正则表达式需要在//之间

[root@chz Desktop]# awk '/chz/{print $1}' /etc/passwd
chz:x:1000:1000:chz:/home/chz:/bin/bash

这里会匹配/etc/passwd包含chz的行，并打印出第一个字段

BEGIN/END

BEGIN block 会在action之前处理，END block会在action之后处理

[root@chz Desktop]# awk 'BEGIN{print "begin block"}/^chz/{print $0}END{print "end block"}' /etc/passwd
begin block
chz:x:1000:1000:chz:/home/chz:/bin/bash
end block

cut

截取文本

-d

指定dleimiter，默认以制表符为分隔
-f

指定截取的字段序号，从1开始

example 1

[root@chz Desktop]# cat test
1 a hello
2 b banna
3 c cat
4 d dog
[root@chz Desktop]# cut -f 1 -d " " test 
1
2
3
4

example 2

注意grep是截取一行，支持正则

[root@chz Desktop]# cat test|grep dog
4 d dog
[root@chz Desktop]# cat test|grep dog|cut -d ' ' -f 2 
d

sed

流式编辑器，不会对文本中的实际内容生效

pattern

sed [options] [command...] [input-file]

commad

[root@chz Desktop]# cat test
1 a hello
2 b banna
3 c cat
4 d dog

aappend

在当前行的下一行新增字符串

[root@chz Desktop]# sed aok test
1 a hello
ok
2 b banna
ok
3 c cat
ok
4 d dog
ok

iinsert

在当前行的上一行新增字符串

[root@chz Desktop]# sed iok test
ok
1 a hello
ok
2 b banna
ok
3 c cat
ok
4 d dog

可以在指定位置执行操作，这里的backslash可以替换成空格

[root@chz Desktop]# sed '2i	ea' test
1 a hello
tea
2 b banna
3 c cat
4 d dog

creplace

替换所有的内容

[root@chz Desktop]# sed cok test
ok
ok
ok
ok

将[2,4]内容替换为tea

[root@chz Desktop]# sed '2,4ca cup of tea' test
1 a hello
a cup of tea
[root@chz Desktop]#

ddelete

删除指定行

[root@chz Desktop]# sed 2,4d  test
1 a hello

这里删除[2,4]行

p只做打印操作

p操作会先打印出所有，然后再打印出匹配的内容, 可以使用-n参数来忽略自动打印的内容
```
[root@chz Desktop]# nl /etc/passwd|sed -n '/chz/p'
    47	chz:x:1000:1000:chz:/home/chz:/bin/bash
```

s/old/new

使用正则将old替换为new

[root@chz Desktop]# sed 's/dog/pig/' test
1 a hello
2 b banna
3 c cat
4 d pig

通过 -e 参数来使用多条 command expression，如果使用 -i 参数会对源文件生效，可以指定suffix

[root@chz Desktop]# sed -e '1ased' -e 's/dog/pig/' test
1 a hello
sed
2 b banna
3 c cat
4 d pig
-------------------
[root@chz Desktop]# sed -i.bak 's/i/t/' test
[root@chz Desktop]# cat test.bak 
1 a hello
2 b banna
3 c cai
4 d dog
[root@chz Desktop]# cat test
1 a hello
2 b banna
3 c cat
4 d dog
[root@chz Desktop]#

特殊案例

区别于s//中正则的$，在其他command中$表示最后一行

[root@chz Desktop]# sed '$a runoob' test
1 a hello
2 b banna
3 c cat
4 d dog
runoob