awk 行处理方式
awk和sed一样,一次处理一行内容,也可以对每行进行切片处理
akw 命令格式
awk [选项参数] 'script' var=value file(s) 或 awk [选项参数] -f scriptfile var=value file(s)
选项参数:
#这儿只列出了常用的部分
-F fs or --field-separator fs 指定输入文件折分隔符,fs是一个字符串或者是一个正则表达式,如-F:。
-f scripfile or --file scriptfile 从脚本文件中读取awk命令。
基本格式:
$ awk [options] 'script' files
script 由两部分组成,分别是
1、pattern,可以是正则表达式或者逻辑判断式
2、{ awk 命令 } 花括号括起来的是代码段
awk 内建变量
变量 | 含义 |
$0 | 当前记录(整行的记录) |
$1~$n | 当前记录的第几列 |
FILENAME | 输入的文件名称 |
FS | 输入文件的字段分隔符(Fields Separator) |
RS | 输入文件的记录(每一行之间)的分隔符(Record Separator) |
NF | 当前行的字段数目(Number of Fields) |
NR | 当前记录所在的行号 |
OFS | 输出字段的分隔符 |
ORS | 输出记录的分隔符 |
awk函数
函数声明 | 含义 |
length(str) | 返回str中字符的个数 |
int(num) | 返回num的整数部分 |
index(str1, str2) | 返回str2在str1中的索引,如果不存在就返回0 |
split(str, arr, separator) | 使用separator作为分隔符,将str切分为数组保存到arr中,返回数组的元素个数 |
printf(fmt, args) | 根据fmt格式化args,并输出结果 |
sprintf(fmt, args) | 根据fmp格式化args,并返回格式化后的字符串 |
substr(str, pos, len) | 返回str中从pos开始,长度为len个字符的子字符串 |
tolower(str) | 返回str转换为小写字母后的副本 |
toupper(str) | 返回str转换为大写字母后的副本 |
测试awk变量和函数 使用实例
测试数据
(base) [root@localhost Tana]# cat data.txt log1.txt female BeiJing 90 Yes log2.txt male ShangHai 55
打印每一行的行号、字段数、以及每一行的内容
(base) [root@localhost Tana]# awk '{print "row:" NR, "fields:" NF, $0}' data.txt row:1 fields:5 log1.txt female BeiJing 90 Yes row:2 fields:4 log2.txt male ShangHai 55
打印所有性别和城市
(base) [root@localhost Tana]# awk '{print $2" "$4}' data.txt female 90 male 55
用printf来格式化信息
(base) [root@localhost Tana]# awk -F ":" '{printf "Line:%s Field:%s User:%s ",NR,NF,$1}' /etc/passwd Line:1 Field:7 User:root Line:2 Field:7 User:bin Line:3 Field:7 User:daemon Line:4 Field:7 User:adm Line:5 Field:7 User:lp Line:6 Field:7 User:sync Line:7 Field:7 User:shutdown Line:8 Field:7 User:halt Line:9 Field:7 User:mail Line:10 Field:7 User:operator Line:11 Field:7 User:games Line:12 Field:7 User:ftp Line:13 Field:7 User:nobody Line:14 Field:7 User:systemd-network Line:15 Field:7 User:dbus Line:16 Field:7 User:polkitd Line:17 Field:7 User:libstoragemgmt Line:18 Field:7 User:colord Line:19 Field:7 User:rpc Line:20 Field:7 User:gluster Line:21 Field:7 User:saslauth Line:22 Field:7 User:abrt Line:23 Field:7 User:rtkit Line:24 Field:7 User:pulse Line:25 Field:7 User:radvd Line:26 Field:7 User:unbound Line:27 Field:7 User:chrony Line:28 Field:7 User:rpcuser Line:29 Field:7 User:nfsnobody Line:30 Field:7 User:qemu Line:31 Field:7 User:tss Line:32 Field:7 User:usbmuxd Line:33 Field:7 User:geoclue Line:34 Field:7 User:ntp Line:35 Field:7 User:sssd Line:36 Field:7 User:setroubleshoot Line:37 Field:7 User:saned Line:38 Field:7 User:gdm Line:39 Field:7 User:gnome-initial-setup Line:40 Field:7 User:sshd Line:41 Field:7 User:avahi Line:42 Field:7 User:postfix Line:43 Field:7 User:tcpdump Line:44 Field:7 User:agiga Line:45 Field:7 User:agiga_190
打印用户id大于100的行号和用户名,其他用户不打印
(base) [root@localhost Tana]# awk -F ':' '{if ($3>99){ printf("Line:%s Field:%s User:%s ",NR,NF,$1) } else { printf "" } }' /etc/passwd Line:14 Field:7 User:systemd-network Line:16 Field:7 User:polkitd Line:17 Field:7 User:libstoragemgmt Line:18 Field:7 User:colord Line:20 Field:7 User:gluster Line:21 Field:7 User:saslauth Line:22 Field:7 User:abrt Line:23 Field:7 User:rtkit Line:24 Field:7 User:pulse Line:26 Field:7 User:unbound Line:27 Field:7 User:chrony Line:29 Field:7 User:nfsnobody Line:30 Field:7 User:qemu Line:32 Field:7 User:usbmuxd Line:33 Field:7 User:geoclue Line:35 Field:7 User:sssd Line:36 Field:7 User:setroubleshoot Line:37 Field:7 User:saned Line:39 Field:7 User:gnome-initial-setup Line:44 Field:7 User:agiga Line:45 Field:7 User:agiga_190
逻辑判断式
逻辑判断式是在 {awk} 前面
~ ,!~,是否匹配正则表达式,后面跟一个正则表达式
#查询主机中,用户名包含'g'的用户,打印出用户名和UID (base) [root@localhost Tana]# awk -F ':' '$1 ~ /g/ {printf "USER:%-15s UID:%s ",$1,$3 }' /etc/passwd USER:games UID:12 USER:libstoragemgmt UID:998 USER:gluster UID:996 USER:geoclue UID:992 USER:gdm UID:42 USER:gnome-initial-setup UID:988 USER:agiga UID:1000 USER:agiga_190 UID:1001
==,!= , >=,<=,<,> ,||,&& 用于判断大小、是否相等、逻辑关系
-> # #打印分数高于70,或者低于或者等于60的记录 -> # awk '$5 > 70 || $5 <= 60 {print $0}' data.txt
扩展格式
awk [options] 'command' file
扩展格式是指 command 的扩展,格式如下:
BEGIN { command1} pattern {awk 命令} END {command2}
其中 BEGIN中的command1,会在读入第一行之前执行,并且只执行一次。接着循环执行中间的awk命令、然后END后面的command2,会在文件所有行都读完之后,并执行一次command2。
(base) [root@localhost Tana]# awk 'BEGIN {print "start" } $0 ~/female/ {print $0} END {print "end"}' data.txt
start
log1.txt female BeiJing 90 Yes
end
BEGIN和END
一般在BEGIN中可以做一下事情:
1、定义分隔符
2、定义表头
自定义分割符、输出时使用的分隔符
FS (fields separator)
OFS( output fields separator)
#定义分隔符
(base) [root@localhost Tana]# awk 'BEGIN {FS= " ";OFS="-"} $0 ~/female/{print $1,$2} END {print "end"}' data.txt
log1.txt-female
log4.txt-female
log6.txt-female
end
##注意 FS =" " ,要用双引号,不能用单引号
定义表头
(base) [root@localhost Tana]# awk 'BEGIN {print "NAME" " " "ender" ' ' "Addr" " " "Sore" }
NF==4 {print $0}
' data.txt
NAME enderAddr Sore
log2.txt male ShangHai 55