ElasticSearch（4）---Bulk批量操作

在这里插入图片描述

上一篇：ElasticSearch（3）—CURL操作

1.Bulk的格式

{action:{metadata}}
  //注意 “
”表示换行符
{requstbody}
 (请求体)

语法	描述
action	(行为)，包含create（文档不存在时创建）、update（更新文档）、index（创建新文档或替换已用文档）、delete（删除一个文档）。
metadata	(行为操作的具体索引信息)，需要指明数据的_index、_type、_id。

create和index的区别：如果数据存在，使用create操作失败，会提示文档已存在，使用index则可以成功执行。

示例：

1. 批量插入

现在有一个文件books.json中有需要批量写入的数据，该文件中的数据如下：

{"index":{"_index":"books","_type":"info","_id":"1"}}
{"name":"西游记","author":"吴承恩","price":"40"}
{"index":{"_index":"books","_type":"info","_id":"2"}}
{"name":"三国演义","author":"罗贯中","price":"41"}
{"index":{"_index":"books","_type":"info","_id":"3"}}
{"name":"水浒传","author":"施耐庵","price":"42"}
{"index":{"_index":"books","_type":"info","_id":"4"}}
{"name":"红楼梦","author":"曹雪芹","price":"43"} //注意：此处还需要点下回车键，否则会报错

使用Xftp将此文件导入linux中的/home/zhangsan/data/目录下：
在这里插入图片描述
在linux中输入下面的命令

curl -H 'Content-Type:application/json' -XPOST 'http://120.76.217.14:9200/_bulk?pretty' --data-binary '@/home/zhangsan/data/books.json'

然后查询索引：

[zhangsan@tomcat-tst data]$ curl -XGET 'http://120.76.217.14:9200/books/info/_search?pretty'
{
  "took" : 813,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "books",
        "_type" : "info",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "西游记",
          "author" : "吴承恩",
          "price" : "40"
        }
      },
      {
        "_index" : "books",
        "_type" : "info",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "三国演义",
          "author" : "罗贯中",
          "price" : "41"
        }
      },
      {
        "_index" : "books",
        "_type" : "info",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "水浒传",
          "author" : "施耐庵",
          "price" : "42"
        }
      },
      {
        "_index" : "books",
        "_type" : "info",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "name" : "红楼梦",
          "author" : "曹雪芹",
          "price" : "43"
        }
      }
    ]
  }
}
[zhangsan@tomcat-tst data]$

批量写入成功，

2. 批量处理

{"update":{"_index":"books","_type":"info","_id":"1"}} //更新文档
{"doc":{"name":"人性的弱点","author":"卡耐基"}} //请求体
{"delete":{"_index":"books","_type":"info","_id":"2"}} //删除不需要请求体
{"create":{"_index":"books","_type":"info","_id":"10"}} //当文档不存在的时候创建文档
{"name":"孙子兵法","author":"孙武","price":"42"} //请求体
{"index":{"_index":"books","_type":"info","_id":"3"}} //创建新文档或者替换已用文档
{"name":"厚黑学","author":"李宗吾","price":"50"} //请求体

以上是Bulk批量操作4中基本的action。

3. 批量处理的数据量

因为Bulk操作会将要处理的数据导入到内存中，所以数据量的处理是有限的。能处理多少数据量取决于硬件配置、文档的大小以及复杂性、索引以及搜索负载。

一般建议是1000-5000个文档，大小建议是5-15MB，默认不能超过100M，可以在es的配置文件（config下的elasticsearch.yml）中进行配置。

4. 不重复指定`/index/type`

在上面的批量操作文档的多个请求体中，都有同样的/books/info，我们可以在curl 的URL中指定/books/info，这样你仍然可以覆盖元数据行中的 _index 和 _type , 但是它将使用 URL 中的这些元数据值作为默认值：

POST /books/info
{"index":{"_id":"3"}} //创建新文档或者替换已用文档
{"name":"厚黑学","author":"李宗吾","price":"50"} //请求体