鸟哥的linux私房菜学习笔记5

P350

grep -n 列出行数

[root@www ~]# grep [-A] [-B] [--color=auto] '搜寻字串' filename
选项与参数：
-A ：后面可加数字，为 after 的意思，除了列出该行外，后续的 n 行也列出来；
-B ：后面可加数字，为 befer 的意思，除了列出该行外，前面的 n 行也列出来；
--color=auto 可将正确的那个撷取数据列出颜色

范例一：用 dmesg 列出核心信息，再以 grep 找出内含 eth 那行
[root@www ~]# dmesg | grep 'eth'
eth0: RealTek RTL8139 at 0xee846000, 00:90:cc:a6:34:84, IRQ 10
eth0:  Identified 8139 chip type 'RTL-8139C'
eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
eth0: no IPv6 routers present
# dmesg 可列出核心产生的信息！透过 grep 来撷取网络卡相关资讯 (eth) ，
# 就可发现如上资讯。不过没有行号与特殊颜色显示！看看下个范例吧！

范例二：承上题，要将捉到的关键字显色，且加上行号来表示：
[root@www ~]# dmesg | grep -n --color=auto 'eth'
247:eth0: RealTek RTL8139 at 0xee846000, 00:90:cc:a6:34:84, IRQ 10
248:eth0:  Identified 8139 chip type 'RTL-8139C'
294:eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
305:eth0: no IPv6 routers present
# 你会发现除了 eth 会有特殊颜色来表示之外，最前面还有行号喔！

范例三：承上题，在关键字所在行的前两行与后三行也一起捉出来显示
[root@www ~]# dmesg | grep -n -A3 -B2 --color=auto 'eth'
245-PCI: setting IRQ 10 as level-triggered
246-ACPI: PCI Interrupt 0000:00:0e.0[A] -> Link [LNKB] ...
247:eth0: RealTek RTL8139 at 0xee846000, 00:90:cc:a6:34:84, IRQ 10
248:eth0:  Identified 8139 chip type 'RTL-8139C'
249-input: PC Speaker as /class/input/input2
250-ACPI: PCI Interrupt 0000:00:01.4[B] -> Link [LNKB] ...
251-hdb: ATAPI 48X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cache, UDMA(66)
# 如上所示，你会发现关键字 247 所在的前两行及 248 后三行也都被显示出来！
# 这样可以让你将关键字前后数据捉出来进行分析啦！

grep 在数据中查寻一个字串时，是以 "整行"

为单位来进行数据的撷取的！』也就是说，假如一个文件内有 10

行，其中有两行具有你所搜寻的字串，则将那两行显示在萤幕上，其他的就丢弃了！

如果你想要取得不论大小写的 the 这个字串，则：

[root@www ~]# grep -in 'the' regular_express.txt
8:I can't finish the test.
9:Oh! The soup taste good.
12:the symbol '*' is represented as start.
14:The gd software is a library for drafting programs.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

例题二、利用中括号 [] 来搜寻集合字节

如果我想要搜寻 test 或 taste 这两个单字时，可以发现到，其实她们有共通的 't?st' 存在～这个时候，我可以这样来搜寻：

[root@www ~]# grep -n 't[ae]st' regular_express.txt
8:I can't finish the test.
9:Oh! The soup taste good.

了解了吧？其实 [] 里面不论有几个字节，他都谨代表某『一个』字节，所以，上面的例子说明了，我需要的字串是『tast』或『test』两个字串而已！而如果想要搜寻到有 oo 的字节时，则使用：

[root@www ~]# grep -n 'oo' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

但是，如果我不想要 oo 前面有 g 的话呢？此时，可以利用在集合字节的反向选择 [^] 来达成：

[root@www ~]# grep -n '[^g]oo' regular_express.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

意思就是说，我需要的是 oo ，但是 oo 前面不能是 g 就是了！

当我们在一组集合字节中，如果该字节组是连续的，例如大写英文/小写英文/数字等等，

就可以使用[a-z],[A-Z],[0-9]等方式来书写，那么如果我们的要求字串是数字与英文呢？

呵呵！就将他全部写在一起，变成：[a-zA-Z0-9]。例如，我们要取得有数字的那一行，就这样：

[root@www ~]# grep -n '[0-9]' regular_express.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.

但由於考虑到语系对於编码顺序的影响，因此除了连续编码使用减号『 - 』之外，

你也可以使用如下的方法来取得前面两个测试的结果：

[root@www ~]# grep -n '[^[:lower:]]oo' regular_express.txt
# 那个 [:lower:] 代表的就是 a-z 的意思！请参考前两小节的说明表格

[root@www ~]# grep -n '[[:digit:]]' regular_express.txt

例题三、行首与行尾字节 ^ $

我们在例题一当中，可以查询到一行字串里面有 the 的，那如果我想要让 the 只在行首列出呢？这个时候就得要使用定位字节了！我们可以这样做：

[root@www ~]# grep -n '^the' regular_express.txt
12:the symbol '*' is represented as start.

此时，就只剩下第 12 行，因为只有第 12 行的行首是 the 开头啊～此外，如果我想要开头是小写字节的那一行就列出呢？可以这样：

[root@www ~]# grep -n '^[a-z]' regular_express.txt
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

你可以发现我们可以捉到第一个字节都不是大写的！只不过 grep 列出的关键字部分不只有第一个字节， grep 是列出一整个字 (word) 说！同样的，上面的命令也可以用如下的方式来取代的：

[root@www ~]# grep -n '^[[:lower:]]' regular_express.txt

好！那如果我不想要开头是英文字母，则可以是这样：

[root@www ~]# grep -n '^[^a-zA-Z]' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
21:# I am VBird
# 命令也可以是： grep -n '^[^[:alpha:]]' regular_express.txt

注意到了吧？那个 ^ 符号，在字节集合符号(括号[])之内与之外是不同的！在 [] 内代表『反向选择』，在 [] 之外则代表定位在行首的意义！要分清楚喔！反过来思考，那如果我想要找出来，行尾结束为小数点 (.) 的那一行，该如何处理：

[root@www ~]# grep -n '.$' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.

特别注意到，因为小数点具有其他意义(底下会介绍)，所以必须要使用跳脱字节()来加以解除其特殊意义！不过，你或许会觉得奇怪，但是第 5~9 行最后面也是 . 啊～怎么无法列印出来？这里就牵涉到 Windows 平台的软件对於断行字节的判断问题了！我们使用 cat -A 将第五行拿出来看，你会发现：

[root@www ~]# cat -An regular_express.txt | head -n 10 | tail -n 6
     5  However, this dress is about $ 3183 dollars.^M$
     6  GNU is free air not free beer.^M$
     7  Her hair is very beauty.^M$
     8  I can't finish the test.^M$
     9  Oh! The soup taste good.^M$
    10  motorcycle is cheap than car.$

我们在第十章内谈到过断行字节在 Linux 与 Windows 上的差异，在上面的表格中我们可以发现 5~9 行为 Windows 的断行字节 (^M$) ，而正常的 Linux 应该仅有第 10 行显示的那样 ($) 。所以罗，那个 . 自然就不是紧接在 $ 之前喔！也就捉不到 5~9 行了！这样可以了解 ^ 与 $ 的意义吗？好了，先不要看底下的解答，自己想一想，那么如果我想要找出来，哪一行是『空白行』，也就是说，该行并没有输入任何数据，该如何搜寻？

[root@www ~]# grep -n '^$' regular_express.txt
22:

因为只有行首跟行尾 (^$)，所以，这样就可以找出空白行啦！再来，假设你已经知道在一个程序脚本 (shell script) 或者是配置档当中，空白行与开头为 # 的那一行是注解，因此如果你要将数据列出给别人参考时，可以将这些数据省略掉以节省保贵的纸张，那么你可以怎么作呢？我们以 /etc/syslog.conf 这个文件来作范例，你可以自行参考一下输出的结果：

[root@www ~]# cat -n /etc/syslog.conf
# 在 CentOS 中，结果可以发现有 33 行的输出，很多空白行与 # 开头

[root@www ~]# grep -v '^$' /etc/syslog.conf | grep -v '^#'
# 结果仅有 10 行，其中第一个『 -v '^$' 』代表『不要空白行』，
# 第二个『 -v '^#' 』代表『不要开头是 # 的那行』喔！