split 命令

最近下游一直说我供给的文件存在乱码，下游定位到了具体哪一条。

一个250w的数据量，有一条数据有问题。几百兆的文件用note去搜索。

我使用用notepad++后，发现根本打不开。

于是只能先拆分后用notepad++打开。。。。。再用显示所有字符告诉下游。我们大数据提供的数据没有乱码。

$ split --help
Usage: split [OPTION]... [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is 'x'.  With no INPUT, or when INPUT
is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N   generate suffixes of length N (default 2)
      --additional-suffix=SUFFIX  append an additional SUFFIX to file names
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of lines per output file
  -d, --numeric-suffixes[=FROM]  use numeric suffixes instead of alphabetic;
                                   FROM changes the start value (default 0)
  -e, --elide-empty-files  do not generate empty output files with '-n'
      --filter=COMMAND    write to shell COMMAND; file name is $FILE
  -l, --lines=NUMBER      put NUMBER lines per output file
  -n, --number=CHUNKS     generate CHUNKS output files; see explanation below
  -u, --unbuffered        immediately copy input to output with '-n r/...'
      --verbose           print a diagnostic just before each
                            output file is opened
      --help     display this help and exit
      --version  output version information and exit

SIZE is an integer and optional unit (example: 10M is 10*1024*1024).  Units
are K, M, G, T, P, E, Z, Y (powers of 1024) or KB, MB, ... (powers of 1000).

CHUNKS may be:
N       split into N files based on size of input
K/N     output Kth of N to stdout
l/N     split into N files without splitting lines
l/K/N   output Kth of N to stdout without splitting lines
r/N     like 'l' but use round robin distribution
r/K/N   likewise but only output Kth of N to stdout

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
For complete documentation, run: info coreutils 'split invocation'

是不是感觉特别不懂，不着急我们继续分析。

-b：值为每一输出档案的大小，单位为 byte。

-C：每一输出档中，单行的最大 byte 数。

-d：使用数字作为后缀。

-l：值为每一输出档的列数大小。

PREFIX:代表前导符，可作为切割文件的前导文件。

1.使用split命令将100KB的date.file文件分割成大小为10KB的小文件：

split -b 10k date.file

ls
结果：
date.file xaa xab xac xad xae xaf xag xah xai xaj

2.文件被分割成多个带有字母的后缀文件，如果想用数字后缀可使用-d参数，同时可以使用-a length来指定后缀的长度：

split -b 10k date.file -d -a 3

 ls
结果：
date.file x000 x001 x002 x003 x004 x005 x006 x007 x008 x009

3.为分割后的文件指定文件名的前缀：

split -b 10k date.file -d -a 3 split_file

 ls
结果：
date.file split_file000 split_file001 split_file002 split_file003 split_file004 split_file005 split_file006 split_file007 split_file008 split_file009

4.使用-l选项根据文件的行数来分割文件，例如把文件分割成每个包含1000行的小文件：

split -l 1000 date.file

那文件合并呢？

linux命令：
比如 cat 1.wav 2.wav 3.wav > all.wav 就是直接把1.wav 2.wav 3.wav 合并成all.wav
注意1.wav 2.wav 3.wav的顺序，all.wav是按照这个顺序合并的。