cat 生成文件运行脚本

nohup python -u day_std_cid_list_data_done.py >eee1.log 2>&1 & 后台运行python脚本

hadoop fs -cat hdfs://ab/day_std/000000_0 | head -100 >> test_tpy11.txt #从集群的文件000000_0中取100条，生成文件test_tpy11.txt 放在当前的路径下

cat test_tpy11.txt | python hp_day_std.py #用hp_day_std.py 脚本测试数据test_tpy11.txt。

其中python hp_day_std.py 是处理流式数据的脚本内容,即如下形式：

for line in sys.stdin:

.....

在数据挖掘中我们经常会增量更新训练日志，需要删除前n行的过期数据，直接用sed命令比较慢，谷歌了一下，发现有个奇技淫巧：

tail -n +3 old_file > new_file
mv new_file old_file
这样就删除了前2行，速度要比sed命令快