Linux 文本处理工具记录

现在有 1000 个文本文件(0.txt ~ 999.txt)，每个文件大概 11M，总共 11G，我想把这 1000 个文本文件的内容随机组合成一个文件。

用cat *.txt | shuf > random试了下，大概第 8 秒内存占用就达到 96%，然后就不再上涨了，最后大概用了 55 秒完成，要求也完美达成。

一个文本文件，可能有多行，每行有多个单词，单词通过空格分隔，现希望输出第 100 到第 500 ([100,500]，闭区间) 个词。

tr ' ' ' ' < inputfile | cut -d' ' -f 100-500 > outputfile

将 edgelist 文件转换为 csv 文件，即在文件头添加 "source,target"，且将空格替换为逗号

sed -e '1i source,target' -e 's/ /,/g' test.edgelist > test.csv
或
awk 'BEGIN{print "source,target"}{print $1","$2}' test.edgelist > test.csv

column -t -s ',' result.csv