elasticsearch 之深入探秘type底层数据结构

1、理论知识

type，是一个index中用来区分类似的数据的，类似的数据，但是可能有不同的fields，而且有不同的属性来控制索引建立、分词器。
field的value，在底层的lucene中建立索引的时候，全部是opaque（不透明） bytes类型，即：不区分类型的。
lucene是没有type的概念的，在document中，实际上将type作为一个document的field来存储，即_type，es通过_type来进行type的过滤和筛选。
一个index中的多个type，实际上是放在一起存储的，因此一个index下，不能有多个type重名，而类型或者其他设置不同的，因为那样是无法处理的。

2、案例实战

（1）插入两条数据

PUT goods_index/electronic_goods/1
{
"name": "geli kongtiao",
"price": 1999.0,
"service_period": "one year"
}

PUT goods_index/eat_goods/2
{
"name": "aozhou dalongxia",
"price": 199.0,
"eat_period": "one week"
}
索引名称为goods_index

在改索引下面分别有两个type electronic_goods和eat_goods

我们来看索引对于的映射

（2）查看mapping

GET /goods_index/_mapping

{
  "goods_index": {
    "mappings": {
      "electronic_goods": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "price": {
            "type": "float"
          },
          "service_period": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "eat_goods": {
        "properties": {
          "eat_period": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "price": {
            "type": "float"
          }
        }
      }
    }
  }
}

一个index中的多个type，实际上是放在一起存储的，在Lucene底层的存储结构如下

（3）lucene 底层的存储

{
   "ecommerce": {
      "mappings": {
        "_type": {
          "type": "string",
          "index": "not_analyzed"
        },
        "name": {
          "type": "string"
        }
        "price": {
          "type": "double"
        }
        "service_period": {
          "type": "string"
        }
        "eat_period": {
          "type": "string"
        }
      }
   }
}

上述两条数据在底层存储结构如下：

{
  "_type": "elactronic_goods",
  "name": "geli kongtiao",
  "price": 1999.0,
  "service_period": "one year",
  "eat_period": ""
}

{
  "_type": "fresh_goods",
  "name": "aozhou dalongxia",
  "price": 199.0,
  "service_period": "",
  "eat_period": "one week"
}

_type字段就是type的名称，两个type中都有name字段，这里两个type中同有name字段，以为type同享一个存储空间，如果

elactronic_goods中的name为data类型，

fresh_goods中name为text类型，如果二者的类型不一样，这里就会存在问题

lucen底层的数据结构会将"elactronic_goods"和fresh_goods的字段取并接存储起来
将类似结构的type放在一个index下，这些type应该有多个field是相同的。假如说，你将两个type的field完全不同，放在一个index下，那么就每条数据都至少有一半的field在底层的lucene中是空值，会有严重的性能问题。

"elactronic_goods"

elasticsearch 之 深入探秘type底层数据结构

elasticsearch 之深入探秘type底层数据结构