19. Fluentd输入插件：in

19. Fluentd输入插件：in_http用法详解

in_http插件允许使用HTTP协议来采集日志事件。这个插件会建立一个支持REST风格的HTTP端点，来接收日志事件请求。

配置示例

<source>
  @type http
  port 9880
  bind 0.0.0.0
  body_size_limit 32m
  keepalive_timeout 10s
</source>

基本用法
如果已经建立了一个使用in_http插件的Fluentd节点，我们可以通过发送post请求向这个节点发送日志记录。比如：

# Post a record with the tag "app.log"
$ curl -X POST -d 'json={"foo":"bar"}' http://localhost:9880/app.log

这里，通过uri来指定日志事件的tag，通过post消息体来传递日志数据，其中"json="指明了日志的封装格式。

默认情况下，日志事件的时间戳字段会在in_http收到日志数据后被添加上。可以通过在url中指定time参数来设置时间戳。比如：

# Overwrite the timestamp to 2018-02-16 04:40:37.3137116
$ curl -X POST -d 'json={"foo":"bar"}' 
  http://localhost:9880/test.tag?time=1518756037.3137116

下边是另一个示例，使用js来发送日志记录。

// Post a record using XMLHttpRequest
var form = new FormData();
form.set('json', JSON.stringify({"foo": "bar"}));

var req = new XMLHttpRequest();
req.open('POST', 'http://localhost:9880/debug.log');
req.send(form);

我们可以看到，通过这种方式，任何基于HTTP的应用程序都可以使用Fluentd作为日志服务。

参数说明
in_http支持通用参数。还支持以下参数：

@type插件类型，取值为http。
port监听端口，默认为9880.
bind监听的网卡地址，默认为0.0.0.0，监听所有网卡。
body_size_limit POST消息体（即日志数据）最大字节数，默认为32MB。
keepalive_timeout HTTP keepalive超时时长，默认为10秒。
add_http_headers是否向日志记录中添加HTTP_为前缀的头部信息，默认不添加。
add_remote_addr是否向日志记录中添加REMOTE_ADDR字段，默认不添加。
如果添加，该字段的值为客户端的ip地址。如果HTTP请求头部中设置了多个X-Forwarded-For字段，in_http采用第一个X-Forwarded-For的值作为REMOTE_ADDR的值。比如：

X-Forwarded-For: host1, host2
X-Forwarded-For: host3

这个HTTP请求头包含了3个X-Forwarded-For，in_http取host1作为REMOTE_ADDR的值。

cors_allow_origins设置CORS域名白名单，默认不可跨域。
如果设置白名单，其值为一个数组，比如["domain1", "domain2"]。
对于白名单之外的域名，in_http会返回403错误。从Fluentd v1.2.6版本，该参数取值支持通配符*，以允许接收任何域名发来的请求。比如：

<source>
  @type http
  port 9880
  cors_allow_origins ["*"]
</source>

respond_with_empty_img 是否使用1×1大小的图片作为响应消息。默认使用空字符串作为响应消息。
<transport> 配置项用于配置是否使用TLS传输。

<transport tls>
  cert_path /path/to/fluentd.crt
  # other parameters
</transport>

<parse>指令设置用于解析输入日志的解析器插件。比如：

<source>
  @type http
  port 9880
  <parse>
    @type regexp
    expression /^(?<field1>d+):(?<field2>w+)$/
  </parse>
</source>

这里使用regexp来解析输入日志。相应的，POST消息体中的日志格式不再是json格式。
可以使用如下命令发送日志记录：

# This will be parsed into {"field1":"123456","field2":"awesome"}
$ curl -X POST -d '123456:awesome' http://localhost:9880/app.log

常见问题

如何以MessagePack格式发送数据到in_http？
可以在post消息体中增加"msgpack="前缀，来指明日志数据格式为MessagePack。比如：

# Send data in msgpack format
$ msgpack=`echo -e "x81xa3fooxa3bar"`
$ curl -X POST -d "msgpack=$msgpack" http://localhost:9880/app.log

如何使用HTTP Content-Type消息头？
in_http可以识别日志请求中的Content-Type消息头，从而识别日志的封装格式。
比如，可以通过在HTTP消息头中指定Content-Type为json，来发送json格式的日志而不使用"json="前缀。

$ curl -X POST -d '{"foo":"bar"}' -H 'Content-Type: application/json' http://localhost:9880/app.log

同样，也可以设置Content-Type为"application/msgpack"来发送MessagePack格式的日志。

$ msgpack=`echo -e "x81xa3fooxa3bar"`
$ curl -X POST -d "$msgpack" -H 'Content-Type: application/msgpack' http://localhost:9880/app.log

性能优化

使用批处理模式处理大量数据
可以将多条日志组合为数组，通过一次HTTP请求发送到in_http接收节点
比如：

# Send multiple events as a JSON array
$ curl -X POST -d 'json=[{"foo":"bar"},{"abc":"def"},{"xyz":"123"}]' http://localhost:9880/app.log

这样通过减少HTTP请求次数提高系统的吞吐量。

压缩数据以减少带宽占用
从v1.2.3开始，Fluentd支持处理gzip格式的压缩数据。
可以在HTTP消息头中通过Content-Encoding来指定数据编码（压缩）方式。

# Send gzip-compressed payload
$ echo 'json={"foo":"bar"}' | gzip > json.gz
$ curl --data-binary @json.gz -H "Content-Encoding: gzip" http://localhost:9880/app.log

仅此即可，不需要对Fluentd进行额外配置。

多worker进程环境
如果在多worker进程模式下使用in_http插件，这些worker进程将会监听相同的端口。

<system>
  workers 3
</system>
<source>
  @type http
  port 9880
</source>

这个配置中，3个worker进程会同时监听9880端口，输入数据会自动在worker进程间路由。

错误排查
为何日志中的"+"号被in_http删除了？这是HTTP规范所致，并非Fluentd所为。
应用程序需要使用合适的编码方式，或使用multipart请求，来避免这种情况。

比如，可以通过如下方式发送带"+"号的日志。

# OK
curl -X POST -H 'Content-Type: multipart/form-data' -F 'json={"message":"foo+bar"}' http://localhost:9880/app.log

# Bad
curl -X POST -F 'json={"message":"foo+bar"}' http://localhost:9880/app.log