你真的了解WebSocket吗?

   WebSocket协议是基于TCP的一种新的协议。WebSocket最初在HTML5规范中被引用为TCP连接,作为基于TCP的套接字API的占位符。它实现了浏览器与服务器全双工(full-duplex)通信。其本质是保持TCP连接,在浏览器和服务端通过Socket进行通信。

 本文将使用Python编写Socket服务端,一步一步分析请求过程!!!

1. 启动服务端

import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('127.0.0.1', 8002))
sock.listen(5)
# 等待用户连接
conn, address = sock.accept()
print('我来了',conn)

启动Socket服务器后,等待用户【连接】,然后进行收发数据。

2. 客户端连接

<body>
  <script>
      var ws = new WebSocket('ws://127.0.0.1:8002')
  </script>
</body>

当客户端向服务端发送连接请求时,不仅连接还会发送【握手】信息,并等待服务端响应,至此连接才创建成功!  

我来了 <socket.socket fd=200, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, 
laddr=('127.0.0.1', 8002), raddr=('127.0.0.1', 9089)>

3. 建立连接【握手】

import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('127.0.0.1', 8002))
sock.listen(5)
# 等待用户连接
conn, address = sock.accept()
# print('我来了',conn)

# 接收信息 msg = conn.recv(8096) print(msg)
# msg的信息
GET / HTTP/1.1 Host: 127.0.0.1:8002 Connection: Upgrade Pragma: no-cache Cache-Control: no-cache User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36 Upgrade: websocket Origin: http://localhost:63342 Sec-WebSocket-Version: 13 Accept-Encoding: gzip, deflate, br Accept-Language: zh-CN,zh;q=0.9,en;q=0.8 Cookie: m_lvt_b3a3fc356d0af38b811a0ef8d50716b8=1552050219;
      csrftoken=zMIqQhUHFPKDYrSfRKXlYziC48Hhr9gybHn5dhT1YCxjWWL0hpFpbhpEK2f1ZveI;
      OUTFOX_SEARCH_USER_ID_NCOO=365670935.15332246;
      session=eyJfcGVybWFuZW50Ijp0cnVlLCJ1c2VyX2lkIjoxfQ.EKbEWg.e-R5XugNGe1_STfG3K8B3jqDTkk Sec-WebSocket-Key: NQ+slWNHq4Xy8xlVMKCNtg== Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits

请求和响应的【握手】信息需要遵循规则:

  • 从请求【握手】信息中提取 Sec-WebSocket-Key
  • 利用magic_stringSec-WebSocket-Key 进行hmac1加密,再进行base64加密
  • 将加密结果响应给客户端

 注:magic string为:258EAFA5-E914-47DA-95CA-C5AB0DC85B11                     就是这个 不能修改

import socket
import base64
import hashlib


def get_headers(data):
    """
    将请求头格式化成字典
    :param data:
    :return:
    """
    header_dict = {}
    data = str(data, encoding='utf-8')
    # for i in data.split('
'):
    #     print(i)
    header, body = data.split('

', 1)
    header_list = header.split('
')
    for i in range(0, len(header_list)):
        if i == 0:
            if len(header_list[i].split(' ')) == 3:
                header_dict['method'], header_dict['url'], header_dict['protocol'] = header_list[i].split(' ')
        else:
            k, v = header_list[i].split(':', 1)
            header_dict[k] = v.strip()
    return header_dict


sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('127.0.0.1', 8002))
sock.listen(5)

# 1. 等待用户连接
conn, address = sock.accept()
# 2. 接收验证消息
data = conn.recv(1024)
headers = get_headers(data)  # 提取请求头信息

magic_string = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11'

# 3. 对数据加密
value = headers['Sec-WebSocket-Key'] + magic_string
ac = base64.b64encode(hashlib.sha1(value.encode('utf-8')).digest())

# 4. 将加密之后的数据返回
# 对请求头中的sec-websocket-key进行加密
response_tpl = "HTTP/1.1 101 Switching Protocols
" 
               "Upgrade:websocket
" 
               "Connection: Upgrade
" 
               "Sec-WebSocket-Accept: %s
" 
               "WebSocket-Location: ws://%s%s

"

response_str = response_tpl % (ac.decode('utf-8'), headers['Host'], headers['url'])
conn.send(bytes(response_str, encoding='utf-8'))

# 5. 接收用户传过来的信息
while True:
    msg = conn.recv(8096)
    print(msg)

在浏览器中的console中 发送

ws.send(123445)

# 这个时候在前端就会收到 加密之后的msg
b"x81x86x0c'W*=x15dx1e8x12"

我们得到的是一串加密的数据 我们怎么才可以读懂呢?

4.客户端和服务端收发数据

客户端和服务端传输数据时,需要对数据进行【封包】和【解包】。客户端的JavaScript类库已经封装【封包】和【解包】过程,但Socket服务端需要手动实现。

解包详细过程: 

0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

 

详细解释

我们拿到加密后的数据
b"x81x86x0c'W*=x15dx1e8x12"
我们可以看到上面的数字
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

就是我们切割拿到的加密数据的第二位 与 (01111111)127做  & (位)运算  得到的结果 分三种情况

如果是127:
  b"x81x86x0c'W*=x15d          x1e8x12"            标红就是数据123445
如果是126:
  b"x81x86x0c'W*=               x15dx1e8x12"
如果是小于等于125:
  b"x81x86                        x0c'W*=x15dx1e8x12" 

官方文档解释

The MASK bit simply tells whether the message is encoded. Messages from the client must be masked, so your server should expect this to be 1.
(In fact, section 5.1 of the spec says that your server must disconnect from a client if that client sends an unmasked message.)
When sending a frame back to the client, do not mask it and do not set the mask bit. We'll explain masking later.
Note: You have to mask messages even when using a secure socket.RSV1-3 can be ignored, they are for extensions.
The opcode field defines how to interpret the payload data: 0x0 for continuation, 0x1 for text (which is always encoded in UTF-8),
0x2 for binary, and other so-called "control codes" that will be discussed later. In this version of WebSockets,
0x3 to 0x7 and 0xB to 0xF have no meaning. The FIN bit tells whether this is the last message in a series.
If it's 0, then the server will keep listening for more parts of the message;
otherwise, the server should consider the message delivered. More on this later.

Decoding Payload Length To read the payload data, you must know when to stop reading.
That
's why the payload length is important to know. Unfortunately, this is somewhat complicated. To read it, follow these steps: ① Read bits 9-15 (inclusive) and interpret that as an unsigned integer. If it's 125 or less, then that's the length; you're done.
If it's 126, go to step 2. If it's 127, go to step 3. ② Read the next 16 bits and interpret those as an unsigned integer. You're done. ③ Read the next 64 bits and interpret those as an unsigned integer (The most significant bit MUST be 0). You're done.

我们读第9-15位,与[01111111]做与运算 & ;要是 小于125或者更小 就直接读取前两位 后面的就是 数据
② 要是等于126 就在读16bits 后面的就是数据
③ 要是等于127 就读64位 后面的就是数据

也就是说 要是127的话 头部就占10个字节 8+2
     要是126的话 头部就占4个字节 2+2
     要是125或者更少 就是2个字节


Reading and Unmasking the Data    masking key 
If the MASK bit was set (and it should be, for client-to-server messages), 
read the next 4 octets (32 bits); this is the masking key. Once the payload length and masking key is decoded,
you can go ahead and read that number of bytes from the socket. Let's call the data ENCODED, and the key MASK. To get DECODED,
loop through the octets (bytes a.k.a. characters for text data) of ENCODED and XOR the octet with the (i modulo 4)th octet of MASK.
In pseudo-code (that happens to be valid JavaScript):
也就是说 在上面三种情况下 在加4个字节 之后就是数据,还没有完事,还要在做下面的计算 才可以 var DECODED = ""; for (var i = 0; i < ENCODED.length; i++) {      官网 DECODED[i] = ENCODED[i] ^ MASK[i % 4]; }
  bytes_list = bytearray()                          python中实现
    for i in range(len(decoded)):
        chunk = decoded[i] ^ mask[i % 4]
        bytes_list.append(chunk)
    body = str(bytes_list, encoding='utf-8')
    print(body)
Now you can figure out what DECODED means depending on your application.

整体代码

import socket
import base64
import hashlib


def get_headers(data):
    """
    将请求头格式化成字典
    :param data:
    :return:
    """
    header_dict = {}
    data = str(data, encoding='utf-8')
    # for i in data.split('
'):
    #     print(i)
    header, body = data.split('

', 1)
    header_list = header.split('
')
    for i in range(0, len(header_list)):
        if i == 0:
            if len(header_list[i].split(' ')) == 3:
                header_dict['method'], header_dict['url'], header_dict['protocol'] = header_list[i].split(' ')
        else:
            k, v = header_list[i].split(':', 1)
            header_dict[k] = v.strip()
    return header_dict


sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('127.0.0.1', 8002))
sock.listen(5)

# 1. 等待用户连接
conn, address = sock.accept()
# 2. 接收验证消息
data = conn.recv(1024)
headers = get_headers(data)  # 提取请求头信息

magic_string = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11'

# 3. 对数据加密
value = headers['Sec-WebSocket-Key'] + magic_string
ac = base64.b64encode(hashlib.sha1(value.encode('utf-8')).digest())

# 4. 将加密之后的数据返回
# 对请求头中的sec-websocket-key进行加密
response_tpl = "HTTP/1.1 101 Switching Protocols
" 
               "Upgrade:websocket
" 
               "Connection: Upgrade
" 
               "Sec-WebSocket-Accept: %s
" 
               "WebSocket-Location: ws://%s%s

"

response_str = response_tpl % (ac.decode('utf-8'), headers['Host'], headers['url'])
# 响应【握手】信息
conn.send(bytes(response_str, encoding='utf-8'))

# 5. 接收用户传过来的信息
while True:
    info = conn.recv(8096)

    payload_len = info[1] & 127
    if payload_len == 126:
        extend_payload_len = info[2:4]
        mask = info[4:8]
        decoded = info[8:]
    elif payload_len == 127:
        extend_payload_len = info[2:10]
        mask = info[10:14]
        decoded = info[14:]
    else:
        extend_payload_len = None
        mask = info[2:6]
        decoded = info[6:]

    bytes_list = bytearray()
    for i in range(len(decoded)):
        chunk = decoded[i] ^ mask[i % 4]
        bytes_list.append(chunk)
    body = str(bytes_list, encoding='utf-8')
    print(body)
服务端
<script>
    var ws = new WebSocket('ws://127.0.0.1:8002')
</script>
客户端
先启动服务端 在启动客户端
在浏览器的console中输入 ws.send('你好') 在服务器就可以收到
操作

 服务端给客户端发送消息

def send_msg(conn, msg_bytes):
    """
    WebSocket服务端向客户端发送消息
    :param conn: 客户端连接到服务器端的socket对象,即: conn,address = socket.accept()
    :param msg_bytes: 向客户端发送的字节
    :return: 
    """
    import struct

    token = b"x81"
    length = len(msg_bytes)
    if length < 126:
        token += struct.pack("B", length)
    elif length <= 0xFFFF:
        token += struct.pack("!BH", 126, length)
    else:
        token += struct.pack("!BQ", 127, length)

    msg = token + msg_bytes
    conn.send(msg)
    return True
服务端给客户端发送消息
<script>
    var ws = new WebSocket('ws://127.0.0.1:8002')
    ws.onmessage = function (event) {
        /* 服务器端向客户端发送数据时,自动执行 */
        console.log(event.data);

    };
</script>
客户端

参见

原文地址:https://www.cnblogs.com/a438842265/p/12003087.html