awk学习笔记

一、使用awk

1、调用awk

awk [options] -f progfile [--] file ...

awk [options] [--] 'program' file ...

2、命令行选项

-F fs

--field-separator fs

设置字段分隔符，如打印用户:

awk -F : '{print $1}' /etc/passwd

-f source-file

--file source-file

从文件读取程序，如：awk -f file /etc/passwd

file内容为：

#!/bin/awk -f

BEGIN {FS=":"}

{print $1}

-v var=val

--assign var=val

设置var的值为val，如：awk -v foo=hello -v bar=world 'BEGIN {print foo,bar}'

3、包含其它文件到awk程序

@include "test1"

BEGIN {

print "This is script test2."

}

二、正则表达式

1、如何使用正则表达式

搜索含foo字符的行，并打印第二列：

awk '/foo/ { print $2 }' BBS-list

-| 555-1234

-| 555-6699

-| 555-6480

-| 555-2127

也可以使用如下形式：

exp ~ /regexp/或exp !~/regexp/

打印第1列含J的行

awk '$1 ~ /J/' inventory-shipped

Jan 13 25 15 115

Jun 31 42 75 492

Jul 24 34 67 436

Jan 21 36 64 620

2、转义字符

有些字符需要转义，如双引号"

awk 'BEGIN { print "He said "hi!" to her." }'

下面是一些转义字符代表的意思：

代表

Backspace, Ctrl-h, ASCII code 8 (BS).

换行符

回车

TAB键

3、正则表达式操作符

抑制有特殊命令的字符，如$表示$本身。

表示字符开始

表示字符结束

表示单个字符

[...]

中括号表达式，匹配方括号号任一字符

[^ ...]

反向匹配中括号内任一字符

或，如^P|M匹配P开头或M

(...)

对字符分组，里面可使用|,如@(samp|code){[^}]+}匹配@code{foo}和@samp{bar}

匹配之前字符的0个或更多

与*相比,只是匹配1个或更多

匹配前面字符的0个或1个

{i}

匹配前面字符的i个，i是整数

{i,j}

匹配前面字符的i至j个,包含i和j

{i,}

匹配前面字符的i个或i个以上

4、gawk特有的正则操作符

表示空格字符

非空格字符

匹配字母或数字或下划线或汉字等

匹配任意不是字母，数字，下划线，汉字的字符

匹配以空字符开头的单词，如 /stow>/匹配stow而不匹配stowaway

5、大小写敏感

x = "aB"

if (x ~ /ab/) ... # this test will fail

IGNORECASE = 1

if (x ~ /ab/) ... # now it will succeed

IGNORECASE = 1可以使用在命令行中，也可以使用在BEGIN中

三、读取输入文件

awk读取文件是一行行读取，并默认以空格分割成一个个字段。

涉及读取文件的有几个变量：

FS:

字段分隔符，默认是空格

RS:

记录分隔符，默认是换行符

NF:

记录每行的字段数，变量。

NR：

记录数，变量。

1、打印linux所有用户名

awk 'BEGIN {FS=":" } {print $1}' /etc/passwd

2、打印最后一列

awk 'BEGIN {FS=":" } {print $NF}' /etc/passwd

3、纵向排列每个字段

awk 'BEGIN {RS=":"} {print}' /etc/passwd

4、统计行数

awk 'END {print NR}' /etc/passwd

四、打印输出

1、print使用方法

语法:print item1, item2, ...

示例：

awk 'BEGIN { print "line onenline twonline three" }'

awk '{ print $1, $2 }' inventory-shipped

2、输出分隔符

OFS:

输出字段分隔符

ORS:

输出记录分隔符

以百分号分隔字段，记录隔一空白行输出。

awk 'BEGIN{FS=":";OFS="%";ORS="nn"} {print $1,$2}' /etc/passwd

3、printf用法

语法：printf format, item1, item2, ...

格式符号：

转换数字成ASCII,如printf "%c", 65结果为A。

%d, %i

打印十进制整数，如printf "%dn", 6.5'结果为6.。

%e, %E

转换数字为科学（指数）符号，如printf "%4.3en", 1950结果为1.950e+03。

以浮点表示法打印数字，如 printf "%4.3f", 1950结果为1950.000

打印字符串,如printf "%10sn", 1950，结果为十个空格加1950。

可更改的格式：

位置指示符，可调整字符串的输出位置。printf "%s %s %sn", "linux", "like","I"输出为：linux like I，我们调整一下位置，printf "%3$s %2$s %1$sn", "linux", "like","I",输出结果为：I like linux

负号，用在宽度前面，用来设置左对齐，因为默认是右对齐，如printf "%-4s", "foo"，输出则是向左对齐了。

空格

待解

示例：

1)第1列10个宽度并向左对齐：

$ awk '{ printf "%-10s %sn", $1, $2 }' BBS-list

-| aardvark 555-5553

-| alpo-net 555-3412

-| barfly 555-7685

-| bites 555-1675

-| camelot 555-0542

-| core 555-2912

-| fooey 555-1234

-| foot 555-6699

-| macfoo 555-6480

-| sdace 555-3430

-| sabafoo 555-2127

4、print和printf输出重定向

print items > output-file

保存items到文件，如分别保存用户和家目录，awk -F: '{ print $1 > "username";print $6 > "home" }' /etc/passwd

print items | command

管道重定向items到命令，如统计用户数量，awk -F: '{ print $1 |"wc -l" }' /etc/passwd

五、表达式

1、操作符

算术操作符

- x

负运算

+ x

正运算,转换成数字。

x ^ y

x ** y

指数运算。

x * y

相乘。

x / y

相除，结果为浮点数字，如3 / 4 为：0.75

x + y

相加

x - y

相减

赋值

lvalue += increment Adds increment to the value of lvalue.

lvalue -= decrement Subtracts decrement from the value of lvalue.

lvalue *= coefficient Multiplies the value of lvalue by coefficient.

lvalue /= divisor Divides the value of lvalue by divisor.

lvalue %= modulus Sets lvalue to its remainder by modulus.

lvalue ^= power

lvalue **= power Raises lvalue to the power power. (c.e.)

递增和递减操作

++lvalue

Increment lvalue, returning the new value as the value of the expression.

lvalue++

Increment lvalue, returning the old value of lvalue as the value of the expression.

--lvalue

Decrement lvalue, returning the new value as the value of the expression. (This expression is like ‘++lvalue’, but instead of adding, it subtracts.)

lvalue--

Decrement lvalue, returning the old value of lvalue as the value of the expression. (This expression is like ‘lvalue++’, but instead of adding, it subtracts.)

2、真值与条件

在awk中，任何非0数值或非空字符串值为真，其它的值(0或空字符串)为假。

BEGIN {

if (3.1415927)

print "A strange truth value"

if ("Four Score And Seven Years Ago")

print "A strange truth value"

if (j = 57)

print "A strange truth value"

}

变量类型与比较表达式

$ echo ' +3.14' | gawk '{ print $0 == " +3.14" }' True

-| 1

$ echo ' +3.14' | gawk '{ print $0 == "+3.14" }' False

-| 0

$ echo ' +3.14' | gawk '{ print $0 == "3.14" }' False

-| 0

$ echo ' +3.14' | gawk '{ print $0 == 3.14 }' True

-| 1

$ echo ' +3.14' | gawk '{ print $1 == " +3.14" }' False

-| 0

$ echo ' +3.14' | gawk '{ print $1 == "+3.14" }' True

-| 1

$ echo ' +3.14' | gawk '{ print $1 == "3.14" }' False

-| 0

$ echo ' +3.14' | gawk '{ print $1 == 3.14 }' True

-| 1

比较运算符

Expression Result

x < y True if x is less than y.

x <= y True if x is less than or equal to y.

x > y True if x is greater than y.

x >= y True if x is greater than or equal to y.

x == y True if x is equal to y.

x != y True if x is not equal to y.

x ~ y True if the string x matches the regexp denoted by y.

x !~ y True if the string x does not match the regexp denoted by y.

subscript in array True if the array array has an element with the subscript subscript.

布尔表达式

boolean1 && boolean2

boolean1 和boolean2 两个为真时，整个表达式才为真。

boolean1 || boolean2

至少一个为真，此表达式为真。

! boolean

boolean为真时，此表达式为假。

条件表达式

selector ? if-true-exp : if-false-exp

x >= 0 ? x : -x

x>=0时，x的值不变，x<=0时，x=-x。

3、函数调用

awk '{ print "The square root of", $1, "is", sqrt($1) }'

4、操作优先级

下面的操作符，由高到低排列：

(...)

分组

字段引用

++ --

递增，增减

^ **

取幂

+ - !

加，减，逻辑非

* / %

乘，除，取余

+ -

加，减

字符连接

没有特殊的符号，仅仅根据并排写。

< <= == != > >= >> | |&

~ !~

匹配，不匹配

数组成员

逻辑与

逻辑或

条件。

= += -= *= /= %= ^= **=

赋值。

六、模式，动作，变量

1、模式元素

/regular expression/

/foo|bar|baz/ { buzzwords++ }

expression

$ awk '$1 == "foo" { print $2 }' BBS-list

$ awk '$1 ~ /foo/ { print $2 }' BBS-list

$ awk '/2400/ && /foo/' BBS-list

$ awk '/2400/ || /foo/' BBS-list

pat1, pat2

awk '$1 == "on", $1 == "off"' myfile

BEGIN

END

$ awk '

> BEGIN { print "Analysis of "foo"" }

> /foo/ { ++n }

> END { print ""foo" appears", n, "times." }' BBS-list

-| Analysis of "foo"

-| "foo" appears 4 times.

BEGINFILE

ENDFILE

Special patterns for you to supply startup or cleanup actions to done on a per file basis. (See BEGINFILE/ENDFILE.)

empty

匹配所有输入。

2、在程序中使用shell变量

printf "Enter search pattern: "

read pattern

awk -v pat="$pattern" '$0 ~ pat { nmatches++ }

END { print nmatches, "found" }' /path/to/data

3、动作

[pattern] { action }

pattern [{ action }]

...

function name(args) { ... }

...

一个动作可以由一个语句或多个语句组合，包含在大括号里。各语句可以由新行或者分号分隔。默认的动作是打印记录。

Awk支持如下语句：

表达式：

调用函数或给变量赋值。

控制语句：

if, for, while,do

复合语句：

if, while, do的组合。

输入语句：

Getline

输出语句：

Print和printf

删除语句：

删除数组。

4、控制语句 in Actions

If-else语句：

if (condition) then-body [else else-body]

if (x % 2 == 0)

print "x is even"

else

print "x is odd"

While语句：

while (condition)

Body

awk '{

i = 1

while (i <= 3) {

print $i

i++

}

}' inventory-shipped

Do-while语句：

body

while (condition)

{

i = 1

do {

print $0

i++

} while (i <= 10)

}

For语句：
for (initialization; condition; increment)

Body

awk '{

for (i = 1; i <= 3; i++)

print $i

}' inventory-shipped

Switch语句：

switch (expression) {

case value or regular expression:

case-body

default:

default-body

}

switch (NR * 2 + 1) {

case 3:

case "11":

print NR - 1

break

case /2[[:digit:]]+/:

print NR

default:

print NR + 1

case -1:

print NR * -1

}

Break语句：

# find smallest divisor of num

{

num = $1

for (div = 2; div * div <= num; div++) {

if (num % div == 0)

break

}

if (num % div == 0)

printf "Smallest divisor of %d is %dn", num, div

else

printf "%d is primen", num

}

# find smallest divisor of num

{

num = $1

for (div = 2; ; div++) {

if (num % div == 0) {

printf "Smallest divisor of %d is %dn", num, div

break

}

if (div * div > num) {

printf "%d is primen", num

break

}

Continue语句：

只在for语句里面使用。

BEGIN {

for (x = 0; x <= 20; x++) {

if (x == 5)

continue

printf "%d ", x

}

print ""

}

Next语句：

强制awk立即停止处理当前记录，而处理下一条记录。

NF != 4 {

err = sprintf("%s:%d: skipped: NF != 4n", FILENAME, FNR)

print err > "/dev/stderr"

}

Exit语句：

exit [return code]

BEGIN {

if (("date" | getline date_now) <= 0) {

print "Can't get system date" > "/dev/stderr"

exit 1

}

print "current date is", date_now

close("date")

}

5、内置变量

用来控制awk的内置变量:

字段分隔符，默认是空格

IGNORECASE

IGNORECASE为非0或者非空，则大小写不敏感。

OFS

输出字段分隔符。

ORS

输出记录分隔符。

记录分隔符。

传递信息的内置变量：

FNR

当前文件记录数，当一个新文件读入时，清空此变量。

字段数量

记录数，新文件读入时不清空。