elasticsearch系列（三）库表理解

首先ES没有库和表的概念，只有index,type,document（详细术语可以看ES的系列一 http://www.cnblogs.com/ulysses-you/p/6736926.html），如果要加快理解的话，可以和一般关系型数据库做简单映射

下面是对这些概念的理解

Index

1.ES的index中shards相当于lucene的indices，而lucene的indices会拥有固定的磁盘空间，内容和文件描述，所以不能无脑新建ES的index，数据量大的1个index比多个小的index效率更高，所以ES的多个types代替多个indices可以减少ES对lucene的管理

2.尽量不要多个index一起查，ES在搜索过程会集合要搜索的每个index下的每个shards，所以会很吃资源

Type

1.1个index下搜索1个type和多个type不需要消耗更多资源

2.fields必须保持一致，1个index中有两个相同name的fields，但是type不同，则这两个fields的propertis必须一样

3.fields尽可能不要稀疏（hbase的表是稀疏型），已经存在的fileds会因为不存在的fields消耗资源，这也是lucene的一个问题

·由于fields稀疏会导致压缩的效率降低。

·1个document会预留一个固定大小的磁盘空间来提高寻址效率

4.由于index-wide统计，1个type下documents的scores会被其他type下documents影响

5.1个稀疏的index比把1个index分割成多个更加有害

总结

选择存储结构时的自问

Are you using parent/child? If yes this can only be done with two types in the same index.
Do your documents have similar mappings? If no, use different indices.
If you have many documents for each type, then the overhead of Lucene indices will be easily amortized so you can safely use indices, with fewer shards than the default of 5 if necessary.
Otherwise you can consider putting documents in different types of the same index. Or even in the same type.

常用套路

1个index包含5个type和5个index只有一个shard几乎是等价的。

2.如果documents的mapping不同，就多开index

3.一般而言，多types的场景很少

4.追求高写入，则增加shards，追求高读取，则减少shards

参考资料

//官方index和type的比较

https://www.elastic.co/blog/index-vs-type

//外国友人写的很详细的ES博客

https://blog.insightdatascience.com/anatomy-of-an-elasticsearch-cluster-part-i-7ac9a13b05db

新博客地址 http://ixiaosi.art/ 欢迎来访 : )