awk使用

-F后面加分隔符‘p_words":’ ,print $2指的是输出分隔符右面内容

awk -F ' "p_words":' '{print $2}'

统计文件passage内容的单词数

cat data/train_middle-processed.json | awk -F ' "p_words":' '{print $2}'|  awk -F ', "p_q_relation":' '{print $1}' | awk '{print NF}'

计算单词数:

echo 'he said no SDG JCD DDDV .' | awk '{print NF}'

 统计词频:

有两句话:

the day is sunny the the
the sunny is is

想得到:

the 4  is 3  sunny 2  day 1
命令脚本:
awk -F" " '{for(i=1;i<=NF;i++){array[$i]+=1;}} END{for(s in array){print s" "array[s];}}' words.txt|sort -nr -k 2

求平均数:

文件:

1 50
2 30
3 20
4 50

命令:

# awk -F' ' '{sum+=$2;count+=1} END{print "SUM:"sum"
AVG:"sum/count}' inputfile 
SUM:150
AVG:37.5

项目使用:

1

cat length.txt |  awk -F" " '{for(i=1;i<=NF;i++){array[$i]+=1;}} END{for(s in array){print s" "array[s];}}' |sort -nr -k 2

2

cat data/train_middle-processed.json | awk -F ' "p_words":' '{print $2}'|  awk -F ', "p_q_relation":' '{print $1}'  |awk '{print NF}' | awk -F" " '{for(i=1;i<=NF;i++){array[$i]+=1;}} END{for(s in array){print s" "array[s];}}'|sort -nr -k 2
原文地址:https://www.cnblogs.com/hozhangel/p/9442293.html