Chunked编码的socket读取

一、Chunked编码与Content-Length

　　Content-Length是HTTP响应头头部的一个参数，Content-Length告诉了浏览器响应报文响应体的大小。

　　Transfer-Encoding: chunked，代表分块编码，响应的长度服务器也无法直接告诉浏览器，响应会分块返回。

　　Content-Length、chunked不能同时出现，只会出现一种。

二、Chunked编码的数据格式

　　Chunked编码把响应分割成若干个大小的块，在每个块之前都会描述这个块的大小；

比如我的这段响应，原本是这样的：

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=UTF-8
Content-Language: zh-CN
Transfer-Encoding: chunked
Vary: Accept-Encoding
Date: Mon, 04 Jun 2018 03:26:41 GMT

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8" />
......

用Chunked编码后收到的就变成了这样了：

HTTP/1.1 200 OK

Server: Apache-Coyote/1.1

Content-Type: text/html;charset=UTF-8

Content-Language: zh-CN

Transfer-Encoding: chunked

Vary: Accept-Encoding

Date: Mon, 04 Jun 2018 03:27:36 GMT


　　　　        // 这是响应头和响应体之间的分隔
9
                // 接下来的消息块长度为9  (此处长度为16进制)       
<!DOCTYPE
        //长度为9的消息块  <!DOCTYPE
1
                // 接下来的消息块长度为1　　(此处长度为16进制)
 
                // 一个空格
4

html

1

>

1




1

<

4

html

1

>

1



1

<

4

head

1

>

1




1

<

4

meta

1

 

7

charset

1

=

1

"

5

UTF-8

1

"

1

 

1

/

1

>

可以发现，消息体的格式是：

16进制的消息块长度

一定长度的消息块

三、socket模拟http请求，解析Chunked编码报文

 1 InputStream is = socket.getInputStream();
 2         OutputStream os = socket.getOutputStream();
 3 
 4         os.write(sb.toString().getBytes());
 5         os.flush();
 6 
 7         boolean isHeadEnd = false;
 8         int times = 0;
 9         byte[] b = new byte[1];
10 
11         while (is.read(b) != -1) {
12             if (isHeadEnd) {
13                 // 处理响应体
14                 // 返回true代表读完了全部响应块
15                 if (parseBody(b, is)) {
16                     break;
17                 }
18             } else {
19                 // times为

当前连续出现的个数，连续出现4个时，代表响应头结束
20                 if ((times = isBodyBegin(b, times)) == 4) {
21                     isHeadEnd = true;
22                 }
23                 // 处理响应头
24                 parseHead(b);
25             }
26         }
27 
28         is.close();
29         os.close();
30         socket.close();

 1 /**
 2      * @param b 当前读的字符
 3      * @param t 

 已经符合的位置数
 4      * @return 

 已经符合的位置数，如果为4 代表出现了


 5      */
 6     private int isBodyBegin(byte[] b, int t) {
 7         if (b[0] == BYTE_R) {
 8             return ++t;
 9         } else if (b[0] == BYTE_N && t != 0) {
10             return ++t;
11         } else {
12             return 0;
13         }
14     }
15 
16     private void parseHead(byte[] b) throws UnsupportedEncodingException {
17         System.out.print(new String(b, "UTF-8"));
18     }
19 
20     private boolean parseBody(byte[] b, InputStream is) throws Exception {
21         // 读取响应长度
22         StringBuffer sb = new StringBuffer();
23         while (b[0] != BYTE_R) {
24             sb.append(new String(b));
25             is.read(b);
26         }
27         is.read(b);
28 
29         // 16进制长度转10进制
30         int length = Integer.parseInt(sb.toString(), 16);
31 
32         // 响应体读取完成
33         if (length == 0) {
34             return true;
35         }
36 
37         // 读取响应内容
38         byte[] content = new byte[length];
39         is.read(content);
40         System.out.print(new String(content, "UTF-8"));
41 
42         is.read(b);
43         is.read(b);
44         return false;
45     }