awk知识点总结

awk命令:
    Linux文本处理三剑客之一,另外还有grep,sed
    ls -l `which awk`:查看awk路径
    GNU awk = gawk
    基本用法:
        gawk [options] 'program' file file ...
            program: PATTERN {ACTION STATEMENT}
                由语句组成,语句分隔符是;
            ACTION: print, printf
            注:{}中为每行要执行的代码
           
    选项:
        -F[]:指明输入字段分隔符,默认是分号;
            例:awk -F: '{print $1,$2}' /etc/passwd
                    取出:分隔后的前两个字段
                awk -F: '{print $1,$2,"password"}' /etc/passwd
                    取出:分隔后的前两个字段,且每一行最后都会加入password单词
   
       
       
        1.awk的输出命令print
            print item1,item2,...
            要点:
                1>各item之间使用逗号分隔,而输出时则使用输出分隔符分隔;
                2>输出的各item可以字符串或数值当前记录的字段($n)、变量或awk的表达式,数值会隐式转换为字符进程输出;
                3>print后面的item如果省略,相当于print $0; 输出空白,使用print "";
               
                例:awk -F: '{print }' /etc/passwd
                    awk -F: '{print $0}' /etc/passwd
                    打印全部文本文件,即$0代表整行
        2.变量
            赋值要用-v,awk有自己的语法。
            2.1 内置变量
                FS: 输入时的字段分隔符,默认为空白字符
                    例:awk -v FS=":" '{print $1,$3}' /etc/passwd
                   
                RS: 输入时的文件换行符,默认为换行符
                    例:awk -v RS=" " '{print $0}' /etc/passwd
                            输出时会以空格为换行符
                           
                OFS: 输出时的字段分割符,默认为空白符
                    例:awk -v OFS="---------" '{print $1,$7}' /etc/passwd
                            输出时每行行尾会追加---------
                        awk -v FS=":" -v OFS="---------" '{print $1,$7}' /etc/passwd
                            输出时分隔符为---------   
                           
                ORS: 输出时的文件换行符,默认为换行符
                    例:awk -v FS=":" -v ORS=" " '{print $1,$7}' /etc/passwd
                            都输出为一行,空格为分隔符
                           
                NF: 字段数
                    例:awk -F: '{print NF}' /etc/passwd
                            分别输出每行的字段数
                        awk -F: '{print $NF}' /etc/passwd
                            输出最后一个字段
                           
                NR: 行数,所有文件统一计数
                    例:awk '{print NR}' /etc/passwd /etc/issue
                            输出所有文件的行数
                           
                        awk '{print NR,$0}' /etc/passwd /etc/issue
                            行首显示行号

                FNR: 行数,各文件分别计数
                    例:awk '{print FNR,$0}' /etc/passwd /etc/issue
                        分别显示行号
                       
                FILENAME: 当前文件名
                    例:awk '{print FILENAME,$0}' /etc/passwd
                        每行都会添加/etc/passwd
               
                ARGC: 命令行参数的个数
                    例:awk '{print ARGC}' /etc/passwd
               
                ARGV: 数组,保存了命令行参数
                    例:awk '{print ARGV[0]}' /etc/passwd
                            输出为awk
                        awk '{print ARGV[1]}' /etc/passwd
                            输出为/etc/passwd
           
            2.2自定义变量
                -v var=val:
                    变量名区分字符大小写
               
                定义变量的位置:
                    (1) 可以program中定义变量;
                        例:awk '{file="passwd";print file,$1}' /etc/passwd
                                每行行首都会追加passwd

                    (2) 通过-v选项定义变量;

        3.printf命令
            格式:printf format, item1,item2,...
           
            例:awk 'BEGIN{printf "%d ",6}'
                    输出数字6并回车
           
            要点:
                1>format是必须的;
                2>不会自动换行,需显式给定行分隔符
                3>format中需要分别为后面的每个item指定一个格式符
           
            格式符:都以%开头,后跟一个字符
                %: 显示字符的ASCII码
                %d,%i: 显示十进制整数
                %e,%E: 科学计数法显示数值
                %f: 显示为浮点数
                %g,%G: 以科学计数法格式或浮点数格式显示数值
                %s: 字符串
                %u: 无符号的整数
                %%: 显示%本身
           
            修饰符:
                #[.#]: 第一个#指定显示宽度,例如%30s,第二个#表示小数点后的精度
                    例:awk -F: '{printf "%20s %20d , $1,$3"}'/etc/passwd
                            输出为右对齐
               
                -:左对齐
                    例:awk -F: '{printf "%-20s %-20d , $1,$3"}'/etc/passwd
                            输出为左对齐,加入了个减号
                   
        4.操作符
            算术操作符:
                x+y, x-y, x*y, x/y, x^y, x%y
                -x:负值
                +x: 转换为数值
               
                例:awk -F: '$3>500{print $0}' /etc/passwd
                    输出UID大于500的行
           
            字符串操作:
                字符串连接
           
            赋值操作符:
                = += -= *= /= %= ^=
                ++ --
                   
            模式匹配符:
                ~
                !~
                例:awk -F: '$1~/root/ {print $7}' /etc/passwd
                        $1匹配上/root/之后,打印$7
           
            逻辑操作符:
                &&
                ||
               
            条件表达式:
                selector?if-true-expression:if-false-expression
                例:awk -F: '{$3>=500?usertype="common user":usertype="sysuser or admin";printf "%20s:%-s ",$1,usertype}' /etc/passwd
                        UID>500就是common user,否则就是sysuser or admin
                       
            函数调用:
       
        5.PATTERN
            (1)/regular expression/:仅处理能够被/regular expression/所匹配到的行
                例:awk -F: '/^<root>/{print $0}' /etc/passwd
                        输出所有以root开头的行
                       
            (2) relational expression:关系表达式,有真假之分,一般来说,其结果为非0或非空字符串时为“真”,否则,为“假”;
                例:awk -F: '$3>=500{print $1,$3}' /etc/passwd
                    awk -F: '$5~/root/{print $0}' /etc/passwd
                   
            (3) line ranges:行范围,类似sed或vim的地址定界法;startline, endline
   
            (4) BEGIN/END: 特殊模式

                仅在awk运行程序之前执行一次(BEGIN)   或仅在awk运行程序之后执行一次(END);
                例:awk  -F: 'BEGIN{print "username","shell "-------------------------}$7~/bash>/{print $1,$7}END{print "------------------------------ "}' /etc/passwd

                    awk  -F: 'BEGIN{username="username";shell="shell";printf "%10s%10s ",username,shell;print "---------------------------"}$7~/bash>/{printf "%10s%10s ",$1,$7}END{print "---------------------------"}' /etc/passwd
           
            (5) empty: 空模式,匹配任意行;
           
        6.常用的action
            (1)表达式
            (2)控制语句
            (3)输入语句
            (4)输出语句
       
        7.控制语句
            if (condition) statement [ else statement ]
            while (condition) statement
            do statement while (condition)
            for (expr1; expr2; expr3) statement
            for (var in array) statement
            break
            continue
            delete array[index]
            delete array
            exit [ expression ]
            { statements }
           
            7.1 if-else
               
                语法:if (condition) statement [ else statement ]
                    if (condition) { statements; } [ else { statements; }]
                   
                    例:awk -F: '{if ($3>=500) print $1," is a common user." }' /etc/passwd
                        awk -F: '{if ($3>=500) {print $1," is a common user."} else {print $1," is a system user or admin."}}' /etc/passwd
                        awk '{if (NF>6) print NF, $0 }' /etc/inittab
                            输出字段数大于6的整行
                   
                用法:对awk取得的整行或行中的字段做条件判断;
               
            7.2 while循环
                语法:while (condition) statement
                    while (condition) { statements }
                    条件为真时进行循环,直到为假退出;
                   
                用法:通常用于在当前行的各字段间进行循环;
               
                    例:awk '{i=1;while(i<=NF){printf "%20s:%d ",$i,length($i); i++}}' /etc/inittab
                            输出每行中每个字段及其长度
                        awk '{i=1;while(i<=NF){if (length($i)>5) {printf "%20s:%d ",$i,length($i);} i++}}' /etc/inittab
                   
            7.3 do-while循环
                语法:do statement while (condition)
                    do { do-while-body }  while (condition)
                    意义:至少执行一次循环体;
                   
            7.4 for循环
                语法:for (expr1; expr2; expr3) statement
                    for (expr1; expr2; expr3) { statements }
                   
                    for (varaiable assignment; condition; iteration process) { for-body }
                   
                    例:awk '{for(i=1;i<=NF;i++) {printf "%s:%d ", $i, length($i)}}' /etc/inittab
                   
                for循环在awk中有一个专用于遍历数组元素:
                    语法:for (var in array) { for-body }
                   
            7.5 switch
                语法:switch (expression) {case VALUE or /REGEXP/: statement; ...; default: statementN}
                   
            7.6 break and continue
                break [n]: 退出当前循环
                continue:提前结束本轮循环,直接进入下轮循环
               
            7.7 next
                提前结束对本行的处理而进入下一行的处理
               
                ~]# awk -F: '{if($3%2!=0) next;print $1,$3}' /etc/passwd
           
        8、Array
           
            关联数组:array[index-expression]
           
                index-expression:
                    可以使用任意字符串;
                    如果某数组元素事先不存在,在引用时,awk会自动创建此元素并将其值初始化为空串;
                        因此,若要判断数组是否存在某元素,要使用“index in array”进行;
                       
                    a[mon]="Monday"
                    print a[mon]
                   
                要遍历数组中的每个元素,使用: for (var in array) { for body }
                    
                     注意:var会遍历array的每一个索引,print array[var]
                    
                例子:统计每一行中各单词分别出现的次数
                    ~]# awk '{for(i=1;i<=NF;i++) {count[$i]++}}END{for(j in count) {print j,count[j]}}' awk.txt

                    awk '{for(i=1;i<=NF;i++) {count[$i]++};for(j in count) {print j,count[j]};for(j in count) {count[j]=""};print

"---------------"}' awk.txt
                   
                    ~]# ss -tan | awk '!/^State/{state[$1]++}END{for (i in state) {print i,state[i]}}'
                    ~]# netstat -tan | awk '/^tcp/{state[$NF]++}END{for(i in state){print i,state[i]}}'
                   
                练习:统计httpd访问日志中,每个IP出现的次数;
                    ~]# awk '{ip[$1]++}END{for(i in ip){print i,ip[i]}}' /var/log/httpd/access_log
                   
        9、函数
           
            9.1 内置函数
                数值处理:
                    rand(): 返回0和1之间一个随机数;
                   
                字符串处理:
                    length([s]): 返回指定字符串的长度
                    sub(r, s [, t]):以r所表示的模式来查找t字符串中的匹配,将其第一次出现替换同s所表示的字符串;
                        sub(ab,AB,$0)

                    gsub(r, s [, t]):以r所表示的模式来查找t字符串中的匹配,将其所有的出现均替换同s所表示的字符串;
                   
                    split(s, a [, r]): 以r为分隔符切割字符串s,并将切割的结果保存至a表示数组中;
                   
                        ~]# netstat -tan | awk '/^tcp/{len=split($5,client,":");ip[client[len-1]]++}END{for(i in ip){print i,ip[i]}}'
                       
                    substr(s, i [, n]): 从s表示的字符串中取子串,从i开始,取n个字符;
               
                时间类的函数:
                    systime(): 取时间戳;
               
                位运算函数:
                    and(v1,va2):
                   
            9.2 自定义函数
                function f_name(p,q)
                {
                    ...
                }

原文地址:https://www.cnblogs.com/yajing-zh/p/4878232.html