Elasticsearch Demo

索引一个文档，使用自己的ID

hadoop@tinylcy:~$ curl -XPUT localhost:9200/website/blog/123 -d '
> {
>     "title" : "My first blog entry",
>     "text" : "Just trying this out...",
>     "date" : "2014/01/01"
> }
> '
{"_index":"website","_type":"blog","_id":"123","_version":1,"created":true}

索引一个文档，自增ID

hadoop@tinylcy:~$ curl -XPOST localhost:9200/website/blog/ -d '
> {
>     "title" : "My second blog entry",
>     "text" : "Still trying this out...",
>     "date" : "2014/01/01"
> }
> '
{"_index":"website","_type":"blog","_id":"AU8wPJKqtV1hoSdW50dt","_version":1,"created":true}

检索文档

hadoop@tinylcy:~$ curl -XGET localhost:9200/website/blog/123?pretty
{
  "_index" : "website",
  "_type" : "blog",
  "_id" : "123",
  "_version" : 1,
  "found" : true,
  "_source":
{
    "title" : "My first blog entry",
    "text" : "Just trying this out...",
    "date" : "2014/01/01"
}

}

hadoop@tinylcy:~$ curl -i -XGET localhost:9200/website/blog/124?pretty
HTTP/1.1 404 Not Found
Content-Type: application/json; charset=UTF-8
Content-Length: 83

{
  "_index" : "website",
  "_type" : "blog",
  "_id" : "124",
  "found" : false
}

检索文档的一部分

通常，GET 请求将返回文档的全部,存储在 _source 参数中。但是可能你感兴趣的字段只是 title 。请求个别字段可以使用 _source 参数。多个字段可以使用逗号分隔。

hadoop@tinylcy:~$  curl -XGET localhost:9200/website/blog/123?_source=title,text 
{"_index":"website","_type":"blog","_id":"123","_version":1,"found":true,"_source":{"text":"Just trying this out...","title":"My first blog entry"}}

或者你只想得到 _source 字段而不要其他的元数据,你可以这样请求:

hadoop@tinylcy:~$ curl -XGET localhost:9200/website/blog/123/_source

{
    "title" : "My first blog entry",
    "text" : "Just trying this out...",
    "date" : "2014/01/01"
}

检查文档是否存在

如果你想做的只是检查文档是否存在——你对内容完全不感兴趣——使用 HEAD 方法来代替 GET 。HEAD 请求不会返回响应体,只有HTTP头:

hadoop@tinylcy:~$ curl -i -XHEAD localhost:9200/website/blog/123
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

hadoop@tinylcy:~$ curl -i -XHEAD localhost:9200/website/blog/124
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

更新整个文档

文档在Elasticsearch中是不可变的——我们不能修改他们。如果需要更新已存在的文档,我们可以使用 index API 重建索引(reindex) 或者替换掉它。

hadoop@tinylcy:~$ curl -XPUT localhost:9200/website/blog/123 -d '
> {
>     "title" : "My first bolg entry",
>     "text" : "I am starting get the hang of this...",
>     "date" : "2014/01/02"
> }
> '
{"_index":"website","_type":"blog","_id":"123","_version":2,"created":false}

在响应中,我们可以看到Elasticsearch把 _version 增加了。
created 标识为 false 因为同索引、同类型下已经存在同ID的文档。
在内部,Elasticsearch已经标记旧文档为删除并添加了一个完整的新文档。旧版本文档不会立即消失,但你也不能去访问它。Elasticsearch会在你继续索引更多数据时清理被删除的文档。

创建一个新文档

当索引一个文档,我们如何确定是完全创建了一个新的还是覆盖了一个已经存在的呢?
_index 、_type 、_id 三者唯一确定一个文档。所以要想保证文档是新加入的,最简单的方式是使用 POST 方法让Elasticsearch自动生成唯一 _id。

然而,如果想使用自定义的 _id ,我们必须告诉Elasticsearch应该在 _index 、 _type 、_id 三者都不同时才接受请求。为了做到这点有两种方法,它们其实做的是同一件事情。

如果请求成功的创建了一个新文档，Elasticsearch将返回正常的元数据且响应状态码是 201 Created 。如果包含相同的 _index 、 _type 和 _id 的文档已经在,Elasticsearch将返回 409 Conflict 响应状态码。

hadoop@tinylcy:~$ curl -XPUT localhost:9200/website/blog/123?op_type=create -d '{
>     "title" : "My first bolg entry",
>     "text" : "I am starting get the hang of this...",
>     "date" : "2014/01/02"
> }
> '
{"error":"DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]","status":409}

hadoop@tinylcy:~$ curl -XPUT localhost:9200/website/blog/123/_create -d '
> {
>     "title" : "My first bolg entry",
>     "text" : "I am starting get the hang of this...",
>     "date" : "2014/01/02"
> }
> '
{"error":"DocumentAlreadyExistsException[[website][4] [blog][123]: document already exists]","status":409}