awk 简单用法小结

1a. choose rows where column 3 is larger than column 5:
awk '$3>$5' input.txt > output.txt

1b. calculate the sum of column 2 and 3 and put it at the end of a row:
awk '{print $0,$2+$3}' input.txt
or replace the first column:
awk '{$1=$2+$3;print}' input.txt

2. show rows between 20th and 80th (better head):
awk 'NR>=20&&NR<=80' input.txt > output.txt

3. calculate the average of column 2:
awk '{x+=$2}END{print x/NR}' input.txt

4. extract column 2,4,5 (cut):
awk '{print $2,$4,$5}' input.txt > output.txt
or
awk 'BEGIN{OFS="\t"}{print $2,$4,$5}' input.txt

5. (more complicated) join two files on column 1 (better join):
awk 'BEGIN{while((getline<"file1.txt")>0)l[$1]=$0}$1 in l{print $0"\t"l[$1]}' file2.txt > output.txt

6. count number of occurrence of column 2 (uniq -c):
awk '{l[$2]++}END{for (x in l) print x,l[x]}' input.txt

7. apply "uniq" on column 2, only printing the first occurence (uniq):
awk '!($2 in l){print;l[$2]=1}' input.txt

8. count different words (wc):
awk '{for(i=1;i!=NF;++i)c[$i]++}END{for (x in c) print x,c[x]}' input.txt

9. deal with simple CSV:
awk -F, '{print $1,$2}'

10. regex (egrep):
awk '/^test[0-9]+/' input.txt

11. substitution (sed is simpler):
awk 'BEGIN{OFS="\t"}{sub(/test/, "no", $0);print}' input.txt