TinyHTTPd源码分析

TinyHTTPd

TinyHTTPd是一个超轻量级的http服务器, 使用C语言开发, 代码只有500多行, 不用于实际生产, 只是为了学习使用. 通过阅读代码可以理解初步web服务器的本质.

主页地址 : http://tinyhttpd.sourceforge.net/

注释后的源码 : https://github.com/tw1996/TinyHTTPd

HTTP协议

在阅读源码之间, 我们先要初步了解HTTP协议. 简单地说HTTP协议就是规定了客户端和服务器的通信格式, 它建立在TCP协议的基础上, 默认使用80端口. 但是并不涉及数据包的传输, 只规定了通信的规范. HTTP本身是无连接的, 也就是说建立TCP连接后就可以直接发送数据, 不必再建立HTTP连接, 对于数据包丢失重传由TCP实现, 下面简单介绍HTTP几个版本.

HTTP/0.9

TCP连接建立后, 客户端只能使用GET方式请求

GET /index.html

服务器只能回应html格式的字符串

<html>
  <body>Hello World</body>
</html>

发送完毕后马上断开TCP连接.

HTTP/1.0

与HTTP/0.9相比, 增加了许多新的功能, 支持任何格式传输, 包括文本, 二进制数据, 文件, 音频等. 支持GET, POST, HEAD命令.

改变了数据通信的格式, 增加了头信息; 其他的新增功能还包括状态码（status code）,多字符集支持,多部分发送（multi-part type）,权限（authorization）,缓存（cache）,内容编码（content encoding）等, 所以HTTP协议一共可分为３部分 , 开始行, 首部行, 实体主体. 其中在首部行和实体主体之间以空格分开, 开始行和首部行都是以结尾举个例子 :

请求信息

GET / HTTP/1.0　　//请求行
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)　　 //请求头
Accept: */*　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　//请求头

响应信息

HTTP/1.0 200 OK 　　　　　　　　　　　　　　　　　　//响应行
Content-Type: text/plain　　　　　　　　　　　　　//响应头
Content-Length: 137582
Expires: Thu, 05 Dec 1997 16:00:00 GMT
Last-Modified: Wed, 5 August 1996 15:55:28 GMT
Server: Apache 0.84

<html>　　　　　　　　　　　　　　　　　　　　　　　　　　　//响应主体
  <body>Hello World</body>
</html>

HTTP/1.0规定, 头信息必须是ASCII码, 后面的数据可以是任何格式, Content-Type 用于规定格式. 下面是一些常见的 Content-Type 字段取值.

text/plain
text/html
text/css
image/jpeg
image/png
image/svg+xml
audio/mp4
video/mp4
application/javascript
application/pdf
application/zip
application/atom+xml

有的浏览器为了提高通信效率, 使用了一个非标准的字段 Connection:keep-alive. 即维持一个TCP连接不断开, 多次发送HTTP数据, 直到客户端或服务器主动断开.

HTTP/1.1

现在最流行的HTTP协议, 默认复用TCP连接, 即不需要手动设置Connection:keep-alive, 客户端在最后一个请求时发送 Connection:close　断开连接.

增加了许多方法 : PUT, PUTCH, HEAD, OPTIONS, DELETE.

引入管道机制, 以前是先发送一个请求, 等待回应继续发送下一个请求. 现在可以连续发送多个请求, 不用等待, 但是服务器仍然会按顺序回应. 使用 Content-Lenth字段区分数据包属于哪一个回应.

为了避免队头堵塞, 只有两种办法 : 少发送数据, 同时开多个持久连接.

HTTP/2

这里就不多做介绍了

CGI与FASTCGI

参考这篇文章 : http://www.php-internals.com/book/?p=chapt02/02-02-03-fastcgi

工作流程

服务器启动, 如果没有指定端口则随机选取端口建立套接字监听客户端连接
accept()会一直阻塞等待客户端连接, 如果客户端连接上, 则创建一个新线程处理该客户端连接.
在accetp_request() 主要处理客户端连接, 首先解析HTTP请求报文. 只支持GET/POST请求, 否则返回HTTP501错误. 如果有请求参数的话, 记录在query_string中. 将请求的路径记录在path中, 如果请求的是目录, 则访问该目录下的index.html文件.
最后判断请求类型, 如果是静态请求, 直接读取文件发送给客户端; 如果是动态请求, 则fork()一个子进程, 在子进程中调用exec()函数簇执行cgi脚本. 然后父进程读取子进程执行结果父子进程之间通过管道通信实现.
父进程等待子进程结束后, 关闭连接, 完成一次HTTP请求.

源码分析

首先看程序入口, 这里建立套接字, 然后与sockaddr_in结构体进行绑定, 然后用listen监听该套接字上的连接请求, 这几步都在startup()中实现.

然后服务器在通过accept接受客户端请求, 如没有请求accept()会阻塞, 如果有请求就会创建一个新线程去处理客户端请求.

int main(void)
{
    /* 定义socket相关信息 */
    int server_sock = -1;
    u_short port = 4000;
    int client_sock = -1;
    struct sockaddr_in client_name;
    socklen_t  client_name_len = sizeof(client_name);
    pthread_t newthread;

    server_sock = startup(&port);
    printf("httpd running on port %d
", port);

    while (1)
    {
        /* 通过accept接受客户端请求, 阻塞方式 */
        client_sock = accept(server_sock,
                (struct sockaddr *)&client_name,
                &client_name_len);
        if (client_sock == -1)
            error_die("accept");
        /* accept_request(&client_sock); */
        /* 开启线程处理客户端请求 */
        if (pthread_create(&newthread , NULL, accept_request, (void *)&client_sock) != 0)
            perror("pthread_create");
    }

    close(server_sock);

    return(0);
}

accept_request()主要处理客户端请求, 做出了基本的错误处理. 主要功能判断是静态请求还是动态请求, 静态请求直接读取文件发送给客户端即可, 动态请求则调用execute_cgi()处理.

/**********************************************************************/
/* A request has caused a call to accept() on the server port to
 * return.  Process the request appropriately.
 * Parameters: the socket connected to the client 
 *　处理每个客户端连接
 * */
/**********************************************************************/
void *accept_request(void *arg)
{
    int client = *(int*)arg;
    char buf[1024];
    size_t numchars;
    char method[255];
    char url[255];
    char path[512];
    size_t i, j;
    struct stat st;
    int cgi = 0;      /* becomes true if server decides this is a CGI
                       * program */
    char *query_string = NULL;
    
    /* 获取请求行，　返回字节数  eg: GET /index.html HTTP/1.1 */
    numchars = get_line(client, buf, sizeof(buf));
    /* debug */
    //printf("%s", buf);

    /* 获取请求方式, 保存在method中  GET或POST */
    i = 0; j = 0;
    while (!ISspace(buf[i]) && (i < sizeof(method) - 1))
    {
        method[i] = buf[i];
        i++;
    }
    j=i;
    method[i] = '';

    /* 只支持GET 和 POST 方法 */
    if (strcasecmp(method, "GET") && strcasecmp(method, "POST"))
    {
        unimplemented(client);
        return NULL;
    }

    /* 如果支持POST方法, 开启cgi */
    if (strcasecmp(method, "POST") == 0)
        cgi = 1;

    i = 0;
    while (ISspace(buf[j]) && (j < numchars))
        j++;
    while (!ISspace(buf[j]) && (i < sizeof(url) - 1) && (j < numchars))
    {
        url[i] = buf[j];
        i++; j++;
    }
    /* 保存请求的url, url上的参数也会保存 */
    url[i] = '';

    //printf("%s
", url);

    if (strcasecmp(method, "GET") == 0)
    {
        /* query_string 保存请求参数 index.php?r=param  问号后面的 r=param */
        query_string = url;
        while ((*query_string != '?') && (*query_string != ''))
            query_string++;
        /* 如果有?表明是动态请求, 开启cgi */
        if (*query_string == '?')
        {
            cgi = 1;
            *query_string = '';
            query_string++;
        }
    }

//    printf("%s
", query_string);

    /* 根目录在 htdocs 下, 默认访问当前请求下的index.html*/
    sprintf(path, "htdocs%s", url);
    if (path[strlen(path) - 1] == '/')
        strcat(path, "index.html");

    //printf("%s
", path);
    /* 找到文件, 保存在结构体st中*/
    if (stat(path, &st) == -1) {
        /* 文件未找到, 丢弃所有http请求头信息 */
        while ((numchars > 0) && strcmp("
", buf))  /* read & discard headers */
            numchars = get_line(client, buf, sizeof(buf));
        /* 404 no found */
        not_found(client);
    }
    else
    {

        //如果请求参数为目录, 自动打开index.html
        if ((st.st_mode & S_IFMT) == S_IFDIR)
            strcat(path, "/index.html");        
        //文件可执行
        if ((st.st_mode & S_IXUSR) ||
                (st.st_mode & S_IXGRP) ||
                (st.st_mode & S_IXOTH)    )
            cgi = 1;
        if (!cgi)
            /* 请求静态页面 */
            serve_file(client, path);
        else
            /*　执行cgi 程序*/
            execute_cgi(client, path, method, query_string);
    }

    close(client);
    return NULL;
}

View Code

下面这个函数的功能就是重点了. 思路是这样的 :

通过fork()一个cgi子进程, 然后在子进程中调用exec函数簇执行该请求, 父进程从子进程读取执行后的结果, 然后发送给客户端.

父子进程之间通过无名管道通信, 　因为cgi是使用标准输入输出, 要获取标准输入输出, 可以把它们重定向到管道. 把stdin 重定向到 cgi_input管道, stdout重定向到 cgi_outout管道.

在父进程中关闭cgi_input的读端个cgi_output的写端, 在子进程中关闭cgi_input的写端和cgi_output的读端.

数据流向为 : cgi_input[1](父进程) -----> cgi_input[0](子进程)[执行cgi函数] -----> stdin -----> stdout -----> cgi_output[1](子进程) -----> cgi_output[0](父进程)[将结果发送给客户端]

/**********************************************************************/
/* Execute a CGI script.  Will need to set environment variables as
 * appropriate.
 * Parameters: client socket descriptor
 *             path to the CGI script */
/**********************************************************************/
void execute_cgi(int client, const char *path,
        const char *method, const char *query_string)
{
    char buf[1024];
    int cgi_output[2];
    int cgi_input[2];
    pid_t pid;
    int status;
    int i;
    char c;
    int numchars = 1;
    int content_length = -1;

    buf[0] = 'A'; buf[1] = '';
    if (strcasecmp(method, "GET") == 0)
            /* 读取和丢弃http请求头*/
        while ((numchars > 0) && strcmp("
", buf))  /* read & discard headers */
            numchars = get_line(client, buf, sizeof(buf));
    else if (strcasecmp(method, "POST") == 0) /*POST*/
    {
        numchars = get_line(client, buf, sizeof(buf));
        while ((numchars > 0) && strcmp("
", buf))
        {
            buf[15] = '';
            /* 获取http消息传输长度 */
            if (strcasecmp(buf, "Content-Length:") == 0)
                content_length = atoi(&(buf[16]));
            numchars = get_line(client, buf, sizeof(buf));
        }
        if (content_length == -1) {
            bad_request(client);
            return;
        }
    }
    else/*HEAD or other*/
    {
    }


    /* 
     * 建立两条管道, 用于父子进程之间通信, cig使用标准输入和输出.
     * 要获取标准输入输出, 可以把stdin重定向到cgi_input管道,  把stdout重定向到cgi_output管道
     * 为什么使用两条管道 ? 一条管道可以看做储存一个信息, 只是一段用来读, 另一端用来写. 我们有标准输入和标准输出两个信息, 所以要两条管道
     * */
    if (pipe(cgi_output) < 0) {
        cannot_execute(client);
        return;
    }
    if (pipe(cgi_input) < 0) {
        cannot_execute(client);
        return;
    }

    /*  创建子进程执行cgi函数, 获取cgi的标准输出通过管道传给父进程, 由父进程发给客户端. */
    if ( (pid = fork()) < 0 ) {
        cannot_execute(client);
        return;
    }
    /* 200　ok状态 */
    sprintf(buf, "HTTP/1.0 200 OK
");
    send(client, buf, strlen(buf), 0);

    /* 子进程执行cgi脚本 */
    if (pid == 0)  /* child: CGI script */
    {
        char meth_env[255];
        char query_env[255];
        char length_env[255];
        
        dup2(cgi_output[1], STDOUT);    //标准输出重定向到cgi_output的写端
        dup2(cgi_input[0], STDIN);        //标准输入重定向到cgi_input的读端
        close(cgi_output[0]);            //关闭cgi_output读端
        close(cgi_input[1]);            //关闭cgi_input写端
        
        /* 添加到子进程的环境变量中 */
        sprintf(meth_env, "REQUEST_METHOD=%s", method);
        putenv(meth_env);
        if (strcasecmp(method, "GET") == 0) {
            //设置QUERY_STRING环境变量
            sprintf(query_env, "QUERY_STRING=%s", query_string);
            putenv(query_env);
        }
        else {   /* POST */
            sprintf(length_env, "CONTENT_LENGTH=%d", content_length);
            putenv(length_env);
        }
        // 最后，子进程使用exec函数簇，调用外部脚本来执行
        execl(path,path,NULL);
        exit(0);
    } else {    /* parent */
        /* 父进程关闭cgi_output的写端和cgi_input的读端 */
        close(cgi_output[1]);
        close(cgi_input[0]);
        /* 如果是POST方法, 继续读取写入到cgi_input管道, 这是子进程会从此管道读取 */
        if (strcasecmp(method, "POST") == 0)
            for (i = 0; i < content_length; i++) {
                recv(client, &c, 1, 0);
                write(cgi_input[1], &c, 1);
            }
        /* 从cgi_output管道中读取子进程的输出, 发送给客户端 */
        while (read(cgi_output[0], &c, 1) > 0)
            send(client, &c, 1, 0);
        /* 关闭管道 */
        close(cgi_output[0]);
        close(cgi_input[1]);
        /* 等待子进程退出 */
        waitpid(pid, &status, 0);
    }
}