elasticsearch 7.5.0 学习笔记

温馨提示：电脑端看不到右侧目录的话请减小缩放比例。

API操作—— 新建或删除查询索引库

新建索引库

新建index，要向服务器发送一个PUT请求，下面是使用curl命令新建了一个名为test的index的例子

curl -XPUT 'localhost:9200/test'

Response：

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "test"
}

删除索引库

删除index，要向服务器发送一个delete请求，下面是删除上面的test的index的例子

curl -XDELETE 'http://localhost:9200/customer'

Response:

{
    "acknowledged": true
}

查询索引库

查询index，要发送一个HEAD请求，例子如下

curl -IHEAD 'localhost:9200/test'

Reponse:

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 359

API操作—— 创建映射关系

接下来的操作用curl比较不方便，复杂的内容使用PostMan进行操作，创建映射关系要使用PUT请求，如下json代码代表我定义了三个字段，分别是title，url和doc，他们的type属性即为数据类型，关于elasticsearch的数据类型这里不过多赘述，analyzer为ik_max_word代表使用了ik分词器的最粗粒度分词

PUT test/_mapping

{
    "properties": {
        "title": {
            "type": "text",
            "analyzer": "ik_max_word"
        },
        "url": {
            "type": "text"
        },
        "doc": {
            "type": "text",
            "analyzer": "ik_max_word"
        }
    }
}

Reponse:

{
    "acknowledged": true
}

API操作—— 查询映射关系

查询索引库的映射关系要使用GET请求，如下我查询了上面创建的映射关系

curl -XGET 'localhost:9200/test/_mapping'

Response:

{
  "test" : {
    "mappings" : {
      "properties" : {
        "doc" : {
          "type" : "text",
          "analyzer" : "ik_max_word"
        },
        "title" : {
          "type" : "text",
          "analyzer" : "ik_max_word"
        },
        "url" : {
          "type" : "text"
        }
      }
    }
  }
}

API操作—— 添加更新文档

添加文档

添加数据使用GET请求发送json数据，以上面的索引库为例，我添加以下数据

GET test/_doc

{
    "title": "PlumK's blog",
    "url": "plumk.site",
    "doc": "this is the content,这里是正文"
}

更新文档

更新数据需要知道数据的id，可以通过查找获得，其他的跟添加数据差不多，GET请求http://localhost:9200/test/_doc/bm87BW8BcAj1cF19_7Sfjson如下：（这里的bm87BW8BcAj1cF19_7Sf就是我上面插入数据的id）

GET test/_doc/bm87BW8BcAj1cF19_7Sf

{
    "title": "PlumK's blog",
    "url": "plumk.site",
    "doc": "this is the content,这里是修改后的正文"
}

Response:

{
    "_index": "test",
    "_type": "_doc",
    "_id": "bm87BW8BcAj1cF19_7Sf",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 1
}

API操作—— 删除文档

删除数据同样需要知道id，以删除上面的数据为例，对下面url发送DELETE请求：

localhost:9200/test/_doc/bm87BW8BcAj1cF19_7Sf

API操作—— 查询文档

Match all查询

最基本的查询，他将返回所有结果，每个对象的得分为1

POST /test/_search

{
   "query":{
      "match_all":{}
   }
}

Response:

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "test",
                "_type": "_doc",
                "_id": "bm87BW8BcAj1cF19_7Sf",
                "_score": 1.0,
                "_source": {
                    "title": "PlumK's blog",
                    "url": "plumk.site",
                    "doc": "this is the content,这里是正文"
                }
            }
        ]
    }
}

Match查询

match查询返回match中属性相同的条目，下面如果url为plumk就查询不到了，也就是说必须属性完全相同

POST test/_search

{
	"query": {
		"match": {
			"url": "plumk.site"
		}
	}
}

Reponse:

{
    "took": 11,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "test",
                "_type": "_doc",
                "_id": "bm87BW8BcAj1cF19_7Sf",
                "_score": 0.2876821,
                "_source": {
                    "title": "PlumK's blog",
                    "url": "plumk.site",
                    "doc": "this is the content,这里是修改后的正文"
                }
            }
        ]
    }
}

Query String查询

该查询使用查询解析器和query_string关键字。例如下：

POST test/_search

{
	"query": {
		"query_string": {
			"query": "plumk"
		}
	}
}

Response:

{
    "took": 25,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "test",
                "_type": "_doc",
                "_id": "bm87BW8BcAj1cF19_7Sf",
                "_score": 0.2876821,
                "_source": {
                    "title": "PlumK's blog",
                    "url": "plumk.site",
                    "doc": "this is the content,这里是修改后的正文"
                }
            }
        ]
    }
}

设置最小匹配度查询

设置minimum_should_match字段，如果满足这个匹配度就会被查询出来，不多废话，看例子

GET test/_search

{
  "query": {
    "match": {
      "title":{"query": "plumk的博客","minimum_should_match": "50%"}
    }
  }
}

多字段查询

这里尝试在url和title中搜索plumk

GET test/_search

{
  "query": {
    "multi_match": {
    	"query": "plumk",
    	"fields": ["url", "title"]  
    }
  }
}

接下来的查询我接下来可能使用不到，但是考虑到会有同样刚入门的小白看到这篇博客，所以这里我还是从网上摘抄以下

Term查询

这些查询主要处理数字、日期等结构化数据。如下：

POST /schools/_search

{
   "query":{
      "term":{"zip":"176115"}
   }
}

Reponse:

{
   "took":1, "timed_out":false, "_shards":{"total":10, "successful":10, "failed":0},
   "hits":{
      "total":1, "max_score":0.30685282, "hits":[{
         "_index":"schools", "_type":"school", "_id":"1", "_score":0.30685282,
         "_source":{
            "name":"Central School", "description":"CBSE Affiliation",
            "street":"Nagan", "city":"paprola", "state":"HP", "zip":"176115",
            "location":[31.8955385, 76.8380405], "fees":2200, 
            "tags":["Senior Secondary", "beautiful campus"], "rating":"3.3"
         }
      }]
   }
}

Range查询

该查询用于查找值在值范围之间的对象。为此，我们需要使用如下运算符

gte:大于等于
gt:大于
lte:小于等于
lt:小于

POST /schools*/_search

{
   "query":{
      "range":{
         "rating":{
            "gte":3.5
         }
      }
   }
}

Response:

{
   "took":31, "timed_out":false, "_shards":{"total":10, "successful":10, "failed":0},
   "hits":{
      "total":3, "max_score":1.0, "hits":[
         {
            "_index":"schools", "_type":"school", "_id":"2", "_score":1.0,
            "_source":{
               "name":"Saint Paul School", "description":"ICSE Affiliation",
               "street":"Dawarka", "city":"Delhi", "state":"Delhi", 
               "zip":"110075", "location":[28.5733056, 77.0122136], "fees":5000, 
               "tags":["Good Faculty", "Great Sports"], "rating":"4.5"
            }
         }, 
         {
            "_index":"schools_gov", "_type":"school", "_id":"2", "_score":1.0, 
            "_source":{
               "name":"Government School", "description":"State Board Affiliation",
               "street":"Hinjewadi", "city":"Pune", "state":"MH", "zip":"411057",
               "location":[18.599752, 73.6821995] "fees":500, 
               "tags":["Great Sports"], "rating":"4"
            }
         },
         {
            "_index":"schools", "_type":"school", "_id":"3", "_score":1.0,
            "_source":{
               "name":"Crescent School", "description":"State Board Affiliation",
               "street":"Tonk Road", "city":"Jaipur", "state":"RJ", "zip":"176114", 
               "location":[26.8535922, 75.7923988], "fees":2500,
               "tags":["Well equipped labs"], "rating":"4.5"
            }
         }
      ]
   }
}

复合查询

这个比较复杂，这里也不过多赘述，可以去看这个博客

此外还有Joining查询，这个涉及到嵌套映射，就不展开讲了，还有地理查询，暂时用不到，可以去看官方文档

Java操作——初始化

maven中的pom引用如下：

<dependency>
	<dependency>
		<groupId>org.elasticsearch</groupId>
		<artifactId>elasticsearch</artifactId>
		<version>7.5.0</version>
	</dependency>
	<dependency>
		<groupId>org.elasticsearch.client</groupId>
		<artifactId>elasticsearch-rest-high-level-client</artifactId>
		<version>7.5.0</version>
	</dependency>
	<dependency>
		<groupId>org.elasticsearch.client</groupId>
		<artifactId>elasticsearch-rest-client</artifactId>
		<version>7.5.0</version>
	</dependency>
	<dependency>
		<groupId>org.elasticsearch.client</groupId>
		<artifactId>elasticsearch-rest-client-sniffer</artifactId>
		<version>7.5.0</version>
	</dependency>
	<dependency>
		<groupId>org.apache.logging.log4j</groupId>
		<artifactId>log4j-api</artifactId>
		<version>2.8.2</version>
	</dependency>
	<dependency>
		<groupId>org.apache.logging.log4j</groupId>
		<artifactId>log4j-core</artifactId>
		<version>2.8.2</version>
	</dependency>

java上的elasticsearch可以连接一个或多个主机初始化客户端，假设我本地主机在9200端口和9021都开了elasticsearch：

RestHighLevelClient client = new RestHighLevelClient(
        RestClient.builder(
                new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http")));

关闭客户端：

client.close();

Java 操作——文档增删改

添加文档

首先我们要构建数据，有四种方法，以上面的test索引库为例：

直接提交json字符串，这种比较麻烦：

IndexRequest request = new IndexRequest("test"); //索引
request.id("1"); //id，此项可选，原因前面已经讲过了
String jsonString = "{" +
        ""url":"rasang.site"," +
        ""title":"rasang's blog"," +
        ""doc":"Welcome to my blog."" +
        "}";
request.source(jsonString, XContentType.JSON);

使用Map作为参数：

Map<String, String> jsonMap = new HashMap<>();
jsonMap.put("url", "rasang.site");
jsonMap.put("title", "rasang's blog");
jsonMap.put("doc", "Welcome to my blog.");
IndexRequest indexRequest = new IndexRequest("test").id("1").source(jsonMap);

使用XConttentBuilder构建：

XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject();
{
    builder.field("url", "rasang.site");
    builder.field("title", "rasang's blog");
    builder.field("doc", "Welcome to my blog.");
}
builder.endObject();
IndexRequest indexRequest = new IndexRequest("test")
    .id("1").source(builder);

使用键值构建

IndexRequest indexRequest = new IndexRequest("test")
    .id("1")
    .source("url", "rasang.site",
        "title", "rasang's blog",
        "doc", "Welcome to my blog.");

其他可选参数：

request.routing("routing"); //路由值
request.timeout(TimeValue.timeValueSeconds(1)); //设置超时
request.timeout("1s"); ////以字符串形式设置超时时间
request.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL); //以WriteRequest.RefreshPolicy实例形式设置刷新策略
request.setRefreshPolicy("wait_for");//以字符串形式刷新策略                     
request.version(2); //文档版本
request.versionType(VersionType.EXTERNAL); //文档类型
request.opType(DocWriteRequest.OpType.CREATE); //操作类型
request.opType("create"); //操作类型 可选create或update
request.setPipeline("pipeline"); //索引文档之前要执行的摄取管道的名称

构建完成后发送到elasticsearch执行添加，java的执行方式有两种

同步执行：

IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT);

同步执行在继续执行后面的代码之前会等待返回索引响应，也就是等待执行结果，可能相应失败或者超时之类的。需要捕获异常。

异步执行

client.indexAsync(request, RequestOptions.DEFAULT, listener);

异步的方法不会阻塞并立即返回。当执行成功后，使用onResponse的方法回调操作监听器，如果失败，则使用onFailure方法回调监听器。监听器实例如下：

listener = new ActionListener<IndexResponse>() {
    @Override
    public void onResponse(IndexResponse indexResponse) {
        
    }

    @Override
    public void onFailure(Exception e) {
        
    }
};

删除文档

删除文档需要构建DeleteRequest，下面是删除一个文档的例子，同样的我们也需要文档的id：

DeleteRequest request = new DeleteRequest("test","u7kkCG8B4ZIWSqMC3kJU");

其他可选参数：

request.routing("routing"); //路由值
request.timeout(TimeValue.timeValueMinutes(2)); //以TimeValue形式设置超时
request.timeout("2m");  //以字符串形式设置超时
request.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL); //以WriteRequest.RefreshPolicy实例的形式设置刷新策略
request.setRefreshPolicy("wait_for");  //以字符串的形式设置刷新策略            
request.version(2); //版本
request.versionType(VersionType.EXTERNAL); //版本类型

同样的，删除请求也可以分同步和异步执行，和上面的创建文档相同：

同步执行：

DeleteResponse deleteResponse = client.delete(
        request, RequestOptions.DEFAULT);

异步执行：

client.deleteAsync(request, RequestOptions.DEFAULT, listener);

修改文档

修改文档可以用脚本更新，这里我只记录用部分文档更新，因为看起来比较方便，与添加文档类似的，也有四种方式：

UpdateRequest request = new UpdateRequest("test", id);
String jsonString = "{" +
        ""url":"rasang.site"," +
        ""title":"rasang's blog"" +
        ""doc":"this is rasang's blog"" +
        "}";
request.doc(jsonString, XContentType.JSON);

Map<String, String> jsonMap = new HashMap<>();
jsonMap.put("url", "rasang.site");
jsonMap.put("title", "rasang's blog");
jsonMap.put("doc", "this is rasang's blog");
UpdateRequest request = new UpdateRequest("test", id)
        .doc(jsonMap);

XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject();
{
    builder.field("url", "rasang.site");
    builder.field("title", "rasang's blog");
    builder.field("doc", "this is rasang's blog");
}
builder.endObject();
UpdateRequest request = new UpdateRequest("test", id)
        .doc(builder);

UpdateRequest request = new UpdateRequest("test", "1")
        .doc("url", "rasang.site",
             "title", "rasang's blog",
             "doc", "this is rasang's blog");

可选参数：

request.routing("routing"); //路由值
request.timeout(TimeValue.timeValueSeconds(1)); //设置超时
request.timeout("1s"); ////以字符串形式设置超时时间
request.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL); //以WriteRequest.RefreshPolicy实例形式设置刷新策略
request.setRefreshPolicy("wait_for");//以字符串形式刷新策略  
request.retryOnConflict(3); //如果要更新的文档在更新操作的获取和索引阶段之间被另一个操作更改，重试更新操作的次数
request.fetchSource(true); //启用源检索，默认情况下禁用
String[] includes = new String[]{"updated", "r*"};
String[] excludes = Strings.EMPTY_ARRAY;
request.fetchSource(
        new FetchSourceContext(true, includes, excludes)); //为特定字段配置源包含
String[] includes = Strings.EMPTY_ARRAY;
String[] excludes = new String[]{"updated"};
request.fetchSource(
        new FetchSourceContext(true, includes, excludes)); //为特定字段配置源排除
request.setIfSeqNo(2L); //ifSeqNo
request.setIfPrimaryTerm(1L); //ifPrimaryTerm
request.detectNoop(false); //禁用noop检测
request.scriptedUpsert(true); //指出无论文档是否存在，脚本都必须运行，即如果文档不存在，脚本负责创建文档。
request.docAsUpsert(true); //指示如果部分文档尚不存在，则必须将其用作upsert文档。
request.waitForActiveShards(2); //设置在继续更新操作之前必须活动的碎片副本数量。
request.waitForActiveShards(ActiveShardCount.ALL); //ActiveShardCount的碎片副本数。可选值：ActiveShardCount.ALL, ActiveShardCount.ONE或者 ActiveShardCount.DEFAULT

同步执行

UpdateResponse updateResponse = client.update(
        request, RequestOptions.DEFAULT);

异步执行

client.updateAsync(request, RequestOptions.DEFAULT, listener);

Java 操作——Bulk批量操作

批量操作需要用到Bulk API，下面我将展示批量操作新建文档，删除文档：

BulkRequest request = new BulkRequest();
request.add(new DeleteRequest("test", "3"));
request.add(new UpdateRequest("test", "2") 
        .doc(XContentType.JSON,"url", "plumk.site"));
request.add(new IndexRequest("posts").id("4")  
        .source(XContentType.JSON,"field", "baz"));

同步执行

client.bulkAsync(request, RequestOptions.DEFAULT, listener);

异步执行

client.bulkAsync(request, RequestOptions.DEFAULT, listener);

BulkResponse

BulkResponse是执行返回的结果，类似于ArrayList，可以迭代所有操作结果

for (BulkItemResponse e : bulkResponse) { 
    DocWriteResponse itemResponse = e.getResponse();

    switch (bulkItemResponse.getOpType()) {
    case INDEX:    
    case CREATE:
        IndexResponse indexResponse = (IndexResponse) itemResponse;
        break;
    case UPDATE:   
        UpdateResponse updateResponse = (UpdateResponse) itemResponse;
        break;
    case DELETE: 
        DeleteResponse deleteResponse = (DeleteResponse) itemResponse;
    }
}

Java 操作——搜索文档

首先我们要创建SearchRequest对象，指定test索引库

SearchRequest searchRequest = new SearchRequest("test");

可以指定多个查询文档库，如下：

SearchRequest searchRequest = new SearchRequest("posts2","posts", "posts2", "posts1");

又或者可以这么操作：

SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("posts2","posts", "posts2", "posts1");

如果你需要设置路由分片的话：

searchRequest.routing("routing");

指定优先分片查询：

searchRequest.preference("_local");

然后要添加请求的特征参数，添加特征参数我们需要用到SearchSourceBuilder对象

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

设置查询的起始索引位置和数量

如下表示从第1条开始，共返回5条文档数据

sourceBuilder.from(0); 
sourceBuilder.size(5);

Match all查询

searchSourceBuilder.query(QueryBuilders.matchAllQuery());

Multi match查询

搜索doc或url中含有rasang的文档

QueryBuilder queryBuilder = QueryBuilders.multiMatchQuery("rasang",
            "doc", "url");

WildcardQuery模糊查询

?匹配单个字符，*匹配多个字符，查询doc字段含有rasang的文档

WildcardQueryBuilder queryBuilder = QueryBuilders.wildcardQuery("doc",
            "*rasang*");

关键字包含查询

以上查询doc字段包含blog的文档

searchSource.query(QueryBuilders.termQuery("doc", "blog"));
// QueryBuilders输入的词条会被es解析并进行分词，在此过程中就已经转换成全小写。

matchQuery模糊查询

QueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("doc", "rasong")
        .fuzziness(Fuzziness.AUTO) //启用模糊查询
	    .prefixLength(3)// 在匹配查询上设置前缀长度选项
        .maxExpansions(10);// 设置最大扩展选项以控制查询的模糊过程
	searchSource.query(matchQueryBuilder);
//MatchQueryBuilder输入的词条会被es解析并进行分词，在此过程中就已经转换成全小写。

复合查询

//模糊查询
WildcardQueryBuilder queryBuilder1 = QueryBuilders.wildcardQuery(
            "doc", "*blog*");//搜索doc中含有blog的文档
WildcardQueryBuilder queryBuilder2 = QueryBuilders.wildcardQuery(
            "url", "*rasang*");//搜索url中含有rasang的文档
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
//doc中必须含有blog,url中必须含有rasang,相当于and
boolQueryBuilder.must(queryBuilder1);
boolQueryBuilder.must(queryBuilder2);

还有其他复合如should，mustNot，用法类似。