elasticSearch--Result window is too large, from + size must be less than or equal to: [10000]

elasticsearch version: 7.8

error message:

{"error":{"root_cause":[{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"index_name_xxx","node":"LaKU2ESgT8SL0_IJk8znWA","reason":{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}]},"status":500}

解决方法, 通过 curl 命令修改索引的 settings 配置:

curl -u "<username>:<password>" -i -H "Content-Type: application/json" -X PUT http://127.0.0.1:9200/_all/_settings  -d '{ "index.max_result_window" :"2147483647"}'

But why setting max_result_window?

找一个环境测试一下.

测试环境没有那么多数据, 设置索引的 max_result_window 为 100

curl -u "<username>:<password>" -i -H "Content-Type: application/json" -X PUT http://127.0.0.1:9200/index_name_xxx/_settings  -d '{ "index.max_result_window" :"100"}'

查看设置后的settings:

curl -u "<username>:<password>" -X GET "http://127.0.0.1:9200/index_name_xxx/_settings?pretty"

 执行查询:

{"from":99,"size":11,"query":{"match_all":{"boost":1.0}}}

出现了一样的异常信息.

参考官方文档: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/paginate-search-results.html#paginate-search-results

Avoid using from and size to page too deeply or request too many results at once. Search
requests usually span multiple shards. Each shard must load its requested hits and the hits for
any previous pages into memory. For deep pages or large sets of results, these operations can
significantly increase memory and CPU usage, resulting in degraded performance or node failures.

By default, you cannot use from and size to page through more than 10,000 hits. This limit is a
safeguard set by the index.max_result_window index setting. If you need to page through more
than 10,000 hits, use the search_after parameter instead.

重点是这句话: Each shard must load its requested hits and the hits for any previous pages into memory.
官方文档说的很明白了, 这次出现问题的原因就是分页 too deeply, 本质的原因是 elasticsearch 分页查询的时候
会把当前查询页以及前面所有页的数据都加载到内存中, 会造成大量内存消耗, 因此 es 会配置 max_result_window 
进行限制.

最后再抛出一个问题, 为什么 elasticsearch 的查询要 "load its requested hits and the hits for any previous pages into memory" 呢?
是否是和数据分片有关?

原文地址:https://www.cnblogs.com/xxoome/p/13914496.html