2017-4-21 Shell+Python对抓包文件后的文本处理过程

    这几天毕设的事情,需要把Modbus数据包变成十六进制形式,但是wireshark不是非常给力,也可能是我还没找到窍门吧。这几天的文本处理把我整的够惨,有些问题以前从来没想过,遇到了真是让人觉得书到用时方恨少呀。做下笔记,以后用的着。

一、目录结构解析

[ root@ssd #] ls /tmp

1.txt   10_BCD.sh   7.sh    get_final.py    README

(1)[ root@ssd #] cat 1.txt  ##其中1.txt是原始抓包文件,

No.     Time           Source                Destination           Protocol Length Info
    246 166.994531     192.168.1.100         192.168.1.101         Modbus/TCP 66        Query: Trans:     0; Unit:   1, Func:   3: Read Holding Registers

Frame 246: 66 bytes on wire (528 bits), 66 bytes captured (528 bits) on interface 0
Ethernet II, Src: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39), Dst: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e)
    Destination: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e)
        Address: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Source: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39)
        Address: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.1.100, Dst: 192.168.1.101
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 52
    Identification: 0x6971 (26993)
    Flags: 0x02 (Don't Fragment)
    Fragment offset: 0
    Time to live: 128
    Protocol: TCP (6)
    Header checksum: 0x0d39 [validation disabled]
    [Header checksum status: Unverified]
    Source: 192.168.1.100
    Destination: 192.168.1.101
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 58708, Dst Port: 502, Seq: 1, Ack: 1, Len: 12
    Source Port: 58708
    Destination Port: 502
    [Stream index: 1]
    [TCP Segment Len: 12]
    Sequence number: 1    (relative sequence number)
    [Next sequence number: 13    (relative sequence number)]
    Acknowledgment number: 1    (relative ack number)
    Header Length: 20 bytes
    Flags: 0x018 (PSH, ACK)
    Window size value: 16425
    [Calculated window size: 65700]
    [Window size scaling factor: 4]
    Checksum: 0xb0f0 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    [SEQ/ACK analysis]
    [PDU Size: 12]
Modbus/TCP
    Transaction Identifier: 0
    Protocol Identifier: 0
    Length: 6
    Unit Identifier: 1
Modbus
    .000 0011 = Function Code: Read Holding Registers (3)
    Reference Number: 0
    Word Count: 10

No.     Time           Source                Destination           Protocol Length Info
    247 167.015547     192.168.1.101         192.168.1.100         Modbus/TCP 83     Response: Trans:     0; Unit:   1, Func:   3: Read Holding Registers

Frame 247: 83 bytes on wire (664 bits), 83 bytes captured (664 bits) on interface 0
Ethernet II, Src: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e), Dst: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39)
    Destination: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39)
        Address: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Source: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e)
        Address: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.1.101, Dst: 192.168.1.100
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 69
    Identification: 0x1d8e (7566)
    Flags: 0x02 (Don't Fragment)
    Fragment offset: 0
    Time to live: 64
    Protocol: TCP (6)
    Header checksum: 0x990b [validation disabled]
    [Header checksum status: Unverified]
    Source: 192.168.1.101
    Destination: 192.168.1.100
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 502, Dst Port: 58708, Seq: 1, Ack: 13, Len: 29
    Source Port: 502
    Destination Port: 58708
    [Stream index: 1]
    [TCP Segment Len: 29]
    Sequence number: 1    (relative sequence number)
    [Next sequence number: 30    (relative sequence number)]
    Acknowledgment number: 13    (relative ack number)
    Header Length: 20 bytes
    Flags: 0x018 (PSH, ACK)
    Window size value: 256
    [Calculated window size: 65536]
    [Window size scaling factor: 256]
    Checksum: 0xdaf5 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    [SEQ/ACK analysis]
    [PDU Size: 29]
Modbus/TCP
    Transaction Identifier: 0
    Protocol Identifier: 0
    Length: 23
    Unit Identifier: 1
Modbus
    .000 0011 = Function Code: Read Holding Registers (3)
    [Request Frame: 246]
    Byte Count: 20
    Register 0 (UINT16): 0
    Register 1 (UINT16): 0
    Register 2 (UINT16): 0
    Register 3 (UINT16): 1
    Register 4 (UINT16): 0
    Register 5 (UINT16): 0
    Register 6 (UINT16): 0
    Register 7 (UINT16): 0
    Register 8 (UINT16): 0
    Register 9 (UINT16): 0

 

(2)[ root@ssd #] cat 10_BCD.sh

#!/bin/bash

if [ ! -d test ];then
        mkdir test 
fi

grep -iA57 "Modbus/TCP 66 " *.txt |grep -iA8 "^Modbus/TCP" >test/b.txt
cd test
yum install dos2unix -y --quiet   ##windows文件放在linux下有个^M字符编码问题,下个dos2unix即可解决
dos2unix b.txt 

cat b.txt |grep "Transaction" |awk -F ":" '{print $2}'|sed 's/^[ 	]*//g'> 111
cat b.txt |grep "Prot" |awk -F ":" '{print $2}'|sed 's/^[ 	]*//g'> 222 
cat b.txt |grep "Leng" |awk -F ":" '{print $2}'|sed 's/^[ 	]*//g'> 333    
cat b.txt |grep "Unit Identifier" |awk -F ":" '{print $2}'|sed 's/^[ 	]*//g'> 444
cat b.txt |grep "Function"|grep "Register" |awk -F ":" '{print $2}'|awk -F "(" '{print $2}'|awk -F ")" '{print $1}'> 555
cat b.txt |grep "Refe" |awk -F ":" '{print $2}'|sed 's/^[ 	]*//g'> 666
cat b.txt |grep "Word"|awk -F ":" '{print $2}'|sed 's/^[ 	]*//g'> 777

if [ $? -eq 0 ];then 
    paste -d "," 111 222 333 444 555 666 777 > c.txt  
    sed -i '/,,/d' c.txt     
    line_number=`cat c.txt | awk -F "," '{if ($NF==NULL)print NR}' `  ##删除最后一个字符是空的行
    arr=($line_number)   ##把字符串转换为数组,arr默认是arr[0]数组第一个元素的意思
    sed -i $arr',$d' c.txt  ##sed命令在shell中太被动了,这个命令害惨我了
    cd ..
    echo "====十进制结果都在test目录下的c.txt文件中=====!"
fi 

(3)[ root@ssd  # ]  cat get_final.py

#!/usr/bin/env python
# -*- coding: utf-8 -*
import os
import commands

commands.getoutput(" /bin/bash 10_BCD.sh >&/dev/null ")

def num_bcd(num):    ##十进制转16进制,取四位!
    a = hex(num)## 25转换为0x19
        if num > 16:
                a = a[:1]+'0'+a[2:4]  ##0x19转换为0019
                a = a[:2]+','+a[2:4]+','  ##0019转换为00,19

        else: ##比如如果是10,就不好办了
                a = a[:1]+'0,0'+a[2]+','
        return a

def fun2(num): ##取两位二进制,比如10转换为0a而不是00,0a
    a = hex(num)
    if num > 16:
        a = a[2:4] + ','   ##字符串切片
    else:
        a = a[:1]+a[2] + ','
    return a
    

f = open('test/c.txt')
contents = []
for line in f.readlines():
    b = line.split(",")  ##line由字符串变成了列表
    for i in range(len(b)):
        if b[i] == " ":  ##如果是空的,认为数据帧是不完整的
            break    
        else:
            b[i] = int(b[i])
            var1 = " "    
            if i == 3 or i == 4: ##保证数据帧第4个和第5个数字只留2位
                var1 = fun2(b[i])
                contents.append(var1)
            else:
                var1 = num_bcd(b[i])
                contents.append(var1)
f.close()

filename = 'new.ini'  
fobj = open(filename, 'w')  
fobj.writelines(['%s%s' % (eachline, os.linesep) for eachline in contents])  ##新的内容放在列表中
fobj.close() 
commands.getoutput(" /bin/bash 7.sh >& /dev/null ")
print "结果在final.txt文件中!"

(4)[ root@ssd  # ]  cat 7.sh

#!/bin/bash

cat new.ini | awk -F "," '{if (NR%7!=0)ORS=" ";else ORS="
";print}' >final_Result
if [ -f new.ini ];then
    rm -f new.ini
fi

(5)[ root@ssd  # ]  cat README

===================操作指南============================
.txt的文件都是是初始抓包文件!

Note: 只需要执行python get_final.py即可,数据帧结果保存在final_result文件中

过程描述:
1、执行python get_final.py的时候,首先调用10_BCD.sh,把原始抓包文件转换为十进制文件,在test目录有7个小文件,最后进行合并,得到b.txt
2、在python主体中,执行从十进制到十六进制的转换,但是每7列的十六进制形式是分散的
3、最后调用7.sh把十六进制排成一行,得到最后的结果final_Result

二、执行结果

[root@ssd modbus]# cat test/c.txt ##最开始是这样的格式
32,0,6,1,3,0,10
32,0,23,1,3,0,10
33,0,6,1,3,0,10
33,0,23,1,3,0,10
34,0,6,1,3,0,10
35,0,6,1,3,0,10
36,0,6,1,3,0,10
37,0,6,1,3,0,10
34,0,23,1,3,0,10
38,0,6,1,3,0,10

#32,0,6,1,3,0,,  #最开始删不掉这种含有两个逗号,中间没有数字的的行

#42,0,6,1,3,0,,   #在shell中,使用awk找到对应行号,然后arr转换为数组,然后sed删除从该行到末尾的行。sed -i $arr',$d' c.txt

[root@ssd modbus]# cat  final_Result   ##结果就是必须这样的十六形式

00,20, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,20, 00,00, 00,17, 01, 03, 00,00, 00,0a,
00,21, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,21, 00,00, 00,17, 01, 03, 00,00, 00,0a,
00,22, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,23, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,24, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,25, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,22, 00,00, 00,17, 01, 03, 00,00, 00,0a,
00,26, 00,00, 00,06, 01, 03, 00,00, 00,0a,

官网:http://www.xiguagongzi.cn/
原文地址:https://www.cnblogs.com/yue-hong/p/6698561.html