sort 命令详解

sort 命令

sort 命令是在Linux里非常有用，它将文件进行排序，并将排序结果标准输出。sort命令既可以从特定的文件，也可以从stdin中获取输入。

语法

sort (选项) (参数)

选项

-b：忽略每行前面开始出的空格字符；
-c：检查文件是否已经按照顺序排序；
-d：排序时，处理英文字母、数字及空格字符外，忽略其他的字符；
-f：排序时，将小写字母视为大写字母；
-i：排序时，除了040至176之间的ASCII字符外，忽略其他的字符；
-m：将几个排序号的文件进行合并；
-M：将前面3个字母依照月份的缩写进行排序；
-n：依照数值的大小排序；
-o<输出文件>：将排序后的结果存入制定的文件；
-r：以相反的顺序来排序；
-t<分隔字符>：指定排序时所用的栏位分隔字符；
+<起始栏位>-<结束栏位>：以指定的栏位来排序，范围由起始栏位到结束栏位的前一栏位。

SORT(1)                                  User Commands                                  SORT(1)

NAME
       sort - sort lines of text files

SYNOPSIS
       sort [OPTION]... [FILE]...
       sort [OPTION]... --files0-from=F

DESCRIPTION
       Write sorted concatenation of all FILE(s) to standard output.

       Mandatory arguments to long options are mandatory for short options too.  Ordering options:

       -b, --ignore-leading-blanks
              ignore leading blanks

       -d, --dictionary-order
              consider only blanks and alphanumeric characters

       -f, --ignore-case
              fold lower case to upper case characters

       -g, --general-numeric-sort
              compare according to general numerical value

       -i, --ignore-nonprinting
              consider only printable characters

       -M, --month-sort
              compare (unknown) < 'JAN' < ... < 'DEC'

       -h, --human-numeric-sort
              compare human readable numbers (e.g., 2K 1G)

       -n, --numeric-sort
              compare according to string numerical value

       -R, --random-sort
              sort by random hash of keys

       --random-source=FILE
              get random bytes from FILE

       -r, --reverse
              reverse the result of comparisons

       --sort=WORD
              sort according to WORD: general-numeric -g, human-numeric -h, month -M, numeric -n, random -R, version -V

       -V, --version-sort
              natural sort of (version) numbers within text

       Other options:

       --batch-size=NMERGE
              merge at most NMERGE inputs at once; for more use temp files

       -c, --check, --check=diagnose-first
              check for sorted input; do not sort

       -C, --check=quiet, --check=silent
              like -c, but do not report first bad line

       --compress-program=PROG
              compress temporaries with PROG; decompress them with PROG -d

       --debug
              annotate the part of the line used to sort, and warn about questionable usage to stderr

       --files0-from=F
              read input from the files specified by NUL-terminated names in file F; If F is - then read names from 
              standard input

       -k, --key=KEYDEF
              sort via a key; KEYDEF gives location and type

       -m, --merge
              merge already sorted files; do not sort

       -o, --output=FILE
              write result to FILE instead of standard output

       -s, --stable
              stabilize sort by disabling last-resort comparison

       -S, --buffer-size=SIZE
              use SIZE for main memory buffer

       -t, --field-separator=SEP
              use SEP instead of non-blank to blank transition

       -T, --temporary-directory=DIR
              use DIR for temporaries, not $TMPDIR or /tmp; multiple options specify multiple directories

       --parallel=N
              change the number of sorts run concurrently to N

       -u, --unique
              with -c, check for strict ordering; without -c, output only the first of an equal run

       -z, --zero-terminated
              end lines with 0 byte, not newline

       --help display this help and exit

       --version
              output version information and exit

       KEYDEF  is  F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character 
       position in the field; both are origin 1, and the stop position defaults to the line's end.
       If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding 
       whitespace.  OPTS is one or more single-letter ordering  options  [bdfgiMhnRrV],  which  override
       global ordering options for that key.  If no key is given, use the entire line as the key.

       SIZE may be followed by the following multiplicative suffixes: % 1% of memory, b 1, K 1024 (default), and so 
       on for M, G, T, P, E, Z, Y.

       With no FILE, or when FILE is -, read standard input.

       *** WARNING *** The locale specified by the environment affects sort order.  Set LC_ALL=C to get the traditional
       sort order that uses native byte values.

       GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Report sort translation bugs to 
       <http://translationproject.org/team/>

AUTHOR
       Written by Mike Haertel and Paul Eggert.

COPYRIGHT
       Copyright © 2013 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later 
       <http://gnu.org/licenses/gpl.html>.
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted 
       by law.

SEE ALSO
       uniq(1)

       The full documentation for sort is maintained as a Texinfo manual.  If the info and sort programs are properly 
       installed at your site, the command

              info coreutils 'sort invocation'

       should give you access to the complete manual.

GNU coreutils 8.22                                    April 2018                                     SORT(1)

参数

文件：指定待排序的文件列表。

实例

sort将文件/文本的每一行作为一个单位，相互比较，比较原则是从首字符向后，依次按ASCII码值进行比较，最后将他们按升序输出。

[root@test sort]# cat sort.txt 
aaa:10:1.1
ccc:30:3.3
ddd:40:4.4
bbb:20:2.2
eee:50:5.5
eee:50:5.6
eee:50:5.5
[root@test sort]# sort sort.txt 
aaa:10:1.1
bbb:20:2.2
ccc:30:3.3
ddd:40:4.4
eee:50:5.5
eee:50:5.5
eee:50:5.6

忽略相同行使用 -u 选项或者 uniq：

[root@test sort]# cat sort.txt 
aaa:10:1.1
ccc:30:3.3
ddd:40:4.4
bbb:20:2.2
eee:50:5.5
eee:50:5.5
eee:50:5.6
eee:50:5.5

[root@test sort]# sort -u sort.txt 
aaa:10:1.1
bbb:20:2.2
ccc:30:3.3
ddd:40:4.4
eee:50:5.5
eee:50:5.6

# 或者使用 uniq 命令。 切记：uniq 只是忽略相邻的向同行
[root@test sort]# uniq sort.txt 
aaa:10:1.1
ccc:30:3.3
ddd:40:4.4
bbb:20:2.2
eee:50:5.5
eee:50:5.6
eee:50:5.5

sort 命令的 -n、-r、-k、-t 选项的使用：

[root@test sort]# cat sort.txt 
AAA:BB:CC
aaa:30:1.6
ccc:50:3.3
ddd:20:4.2
bbb:10:2.5
eee:40:5.4
eee:60:5.1

# 将BB列按照数字从小到大顺序排列
[root@test sort]# sort -nk 2 -t: sort.txt 
AAA:BB:CC
bbb:10:2.5
ddd:20:4.2
aaa:30:1.6
eee:40:5.4
ccc:50:3.3
eee:60:5.1

# 将CC列数字从大到小顺序排列
[root@test sort]# sort -nrk 3 -t: sort.txt 
eee:40:5.4
eee:60:5.1
ddd:20:4.2
ccc:50:3.3
bbb:10:2.5
aaa:30:1.6
AAA:BB:CC

# -n 是按照数字大小排序
# -r 是以相反顺序
# -k 是指定需要排序的栏位
# -t 指定栏位分隔符，此处为冒号

-k 选项的具体语法格式：

[FStart[.CStart]] Modifie[,[FEnd.[CEnd]] Modifier]]
----------Start-----------,----------End-----------
    FStart.CStart 选项     ,     FEnd.CEnd 选项

这个语法格式可以被其中的逗号,分为两大部分，Start部分和End部分。Start部分也由三部分组成，其中的Modifier部分就是我们之前说过的类似n和r的选项部分。我们重点说说Start部分的FStart和C.Start。

C.Start也是可以省略的，省略的话就表示从本域的开头部分开始。FStart.CStart，其中FStart就是表示使用的域，而CStart则表示在FStart域中从第几个字符开始算“排序首字符”。若不设定 End 部分，则就认为End被设定为行尾。

同理，在End部分中，你可以设定FEnd.CEnd，如果你省略.CEnd，则表示结尾到“域尾”，即本域的最后一个字符。或者，如果你将CEnd设定为0(零)，也是表示结尾到“域尾”。

每列的信息：姓名 身高 年龄 工资
[root@test sort]# cat info.txt 
zhangsan 175 20 5000
lisi 170 25 6000
wangwu 170 28 5000
zhangxiaoliu 165 30 6000

# 按照员工姓名进行排序
[root@test sort]# sort -t ' ' -k 1 info.txt 
lisi 170 25 6000
wangwu 170 28 5000
zhangsan 175 20 5000
zhangxiaoliu 165 30 6000
按照姓名，第一个区域进行比较即可：-k 1 

# 按照员工身高进行排序
[root@test sort]# sort -t ' ' -n -k 2 info.txt 
zhangxiaoliu 165 30 6000
lisi 170 25 6000
wangwu 170 28 5000
zhangsan 175 20 5000
按照升高，数字要加 -n；第二个区域比较实用：-k 2,；
但是lisi和wangwu身高一样，这时默认会按照第一区域进行比较，所以lisi在前。

# 按照员工身高进行排序，身高相同的员工按照工资升序排序。
[root@test sort]# sort -t ' ' -n -k2 -k4 info.txt 
zhangxiaoliu 165 30 6000
wangwu 170 28 5000
lisi 170 25 6000
zhangsan 175 20 5000
按照身高和工资，这样设定了区域，加入 -k2 -k4 ，从第2个区域开始比较，
如果相同，再以第4个区域排序。（若需要，可以一直在后面加上区域）

# 按照员工工资降序排序，工资相同的以年龄升序排序
[root@test sort]# sort -t ' ' -n -k4r -k3 info.txt 
lisi 170 25 6000
zhangxiaoliu 165 30 6000
zhangsan 175 20 5000
wangwu 170 28 5000
先比较工资，要先比较第四个区域，另外要降序，则是 -k4r 在前，再比较年龄，则是 -k2 默认升序，所以为 -n -k4r -k2。
因为比较的都是数字，所以 -n 参数前置，还可以写成： -k4rn -k2n
[root@test sort]# sort -t ' ' -k4rn -k3n info.txt 
lisi 170 25 6000
zhangxiaoliu 165 30 6000
zhangsan 175 20 5000
wangwu 170 28 5000

# 按照员工姓名的第二个字母排序，如果相同的则按照工资进行降序排序
[root@test sort]# sort -t ' ' -k1.2,1.2 -k4nr info.txt 
wangwu 170 28 5000
zhangxiaoliu 165 30 6000
zhangsan 175 20 5000
lisi 170 25 6000
-k1.2  比较的是第一个区域的第2个字母开始，到本区域的最后一个字符结束，来进行比较。
       故zhangsan和zhaoxiaoliu，n 在 o 的前面，所以 zhangsan 排在前面。
-k1.2,1.2 限定了姓名第二个字母的比较范围，因此第一个区域必须使用 FStart和CStart指定，及-k1.2,1.2；
          再比较工资第四个区域，数字降序排列，故为 -k4nr。

从公司英文名称的第二个字母开始进行排序：（有字符和数值同时排序时）

[root@test sort]# cat company.txt 
dangdang 50 6000
baidu 100 5000
sohu 100 4500
google 110 5000
guge 50 3000

[root@test sort]# sort -t ' ' -k 1.2 company.txt 
baidu 100 5000
dangdang 50 6000
sohu 100 4500
google 110 5000
guge 50 3000

-k 1.2 表示对第一个域的第二个字符开始到本域的最后一个字符为止的字符串进行排序。baidu 和 dangdang 第二个字符都是 a，但是第三个字符baidu 的 i 顺序优先于 n，所以 baidu 名列榜首。sohu 和 google 第二个字符都是 o，但是 sohu 的 h 在 google 的 o 前面，所以 sohu 排在 google 前面。guge 只能排在最后。

只针对公司英文名称的第二个字母进行排序，如果相同的按照员工工资进行排序：

[root@test sort]# cat company.txt 
dangdang 50 6000
baidu 100 5000
sohu 100 4500
google 110 5000
guge 50 3000

# 只针对公司英文名称的第二个字母进行排序，如果相同的按照员工工资进行降序排序
[root@test sort]# sort -t ' ' -k 1.2,1.2 -k3nr company.txt 
dangdang 50 6000
baidu 100 5000
google 110 5000
sohu 100 4500
guge 50 3000

# 只针对公司英文名称的第二个字母进行排序，如果相同的按照员工工资进行升序排序
[root@test sort]# sort -t ' ' -k 1.2,1.2 -k3n company.txt 
baidu 100 5000
dangdang 50 6000
sohu 100 4500
google 110 5000
guge 50 3000

由于只对第二个字母进行排序，所以我们使用了 -k 1.2,1.2 的表示方式（此处也可以写成 -k1.2,1.2），表示我们只对第二个字母进行排序。

（如果问使用 -k 1.2 怎么不行？当然不行，因为后面省略了 End 部分，这就意味着你将对从第二个字母到本域最后一个字符为止的字符串进行排序，最后排除来的就只能按 -k3nr 或 -k3n 来排序了。）

在只对公司英文名称第二个人字母排序之后，那么接下来要对员工工资进行排序，此处使用了 -k3n或 -k3nr （也可以使用 -k 3n 或 -k 3nr），因为本域是工资（数值），那么必须在本域后加上n。

错误的示范，如下：（这个 n 必须在本域3的后面，在本域（3）前面加 n 会出错；若加在 k 前面也会得不到想要的结果。）

[root@test sort]# cat company.txt 
dangdang 50 6000
baidu 100 50000
sohu 100 4500
google 110 5000
guge 50 3000

[root@test sort]# sort -t ' ' -k 1.2,1.2 -k n3 company.txt 
sort: invalid number at field start: invalid count at start of ‘n3’

[root@test sort]# sort -t ' ' -k 1.2,1.2 -nk 3 company.txt 
guge 50 3000
sohu 100 4500
google 110 5000
dangdang 50 6000
baidu 100 50000

# 网络上有些使用 -nrk 3,3，这种方式也得不到想要的结果。是有问题的。
[root@test sort]# sort -t ' ' -k 1.2,1.2 -nrk 3,3 company.txt 
baidu 100 50000
dangdang 50 6000
google 110 5000
sohu 100 4500
guge 50 3000