工作中遇到要根据文件中某个字段分割成多行文本的处理,想到用awk处理,这里记录下:
问题:
原文件:假设一共2个字段,用“|”分割,其中第二个字段用“#”分割,但该字段中也有不含“#”的值和空值
要求:根据第二个字段,若含#,将这条数据根据#分割成多条数据,无#和无值的行不变
202143108500|#0_1000_VOICE#0_1000_VOICE#0_1000_VOICE#0_TRAFFIC#0_TRAFFIC#0_TRAFFIC 202121366359|#0_1000_VOICE#0_TRAFFIC 202143108500|#0_1000_VOICE#0_1000_VOICE#0_1000_VOICE#0_TRAFFIC#0_TRAFFIC#0_TRAFFIC 202121366359|#0_1000_VOICE#0_TRAFFIC 202113492312|W_GH_YYM 202132164529|
用awk解决:
1、将含“#”的一行变多行
awk -F "|" -vOFS="|" '{l=split($2,arr,"#");for(i=1;i<l;i++){$2=arr[i+1];print}}' ./test.txt
结果:
202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202121366359|0_1000_VOICE 202121366359|0_TRAFFIC 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202121366359|0_1000_VOICE 202121366359|0_TRAFFIC
2、将不含“#”筛选出来
awk -F "|" '$2!~/#/{print}' ./test.txt
结果:
202113492312|W_GH_YYM 202132164529|
经过上面两步就可以解决,将结果生成新的文件 a.txt
awk -F "|" -vOFS="|" '{l=split($2,arr,"#");for(i=1;i<l;i++){$2=arr[i+1];print}}' ./test.txt >a.txt awk -F "|" '$2!~/#/{print}' ./test.txt >>a.txt
a.txt:
202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202121366359|0_1000_VOICE 202121366359|0_TRAFFIC 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202121366359|0_1000_VOICE 202121366359|0_TRAFFIC 202113492312|W_GH_YYM 202132164529|