ES项目实战

前置

ES: Java Spark/Flink Stack + Spring Boot + ES Scala/Java + Java/Scala + Java ==> 用API的方式来掌握ES的用法(API、SpringBoot的使用) ES: API RESTFul

ElasticSearch + Kibana 存储展示/分析 ES Plugin: Head SQL Kibana (三个插件,Kibana也算插件) 最终数据要到SQL (易用性)

ES的安装

地址: https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.6.2.tar.gz 屏幕快照 2019-05-10 23.07.02 屏幕快照 2019-05-10 23.07.50

解压: tar -zxvf elasticsearch-6.6.2.tar.gz -C ~/app/
配置到系统环境变量:
修改: cd elasticsearch-6.6.2 删除bin目录中以bat结尾的命令(windows中的,无用)

$ES_HOME (重要的文件)
bin
elasticsearch 启动ES
elasticsearch 前台方法启动 elasticsearch -d 后台方法启动 (需要打开日志查看系统输出信息) elasticsearch-plugin 管理ES插件
elasticsearch-sql-cli sql客户端
config

elasticsearch.yml es的配置信息

#cluster.name: my-application  (集群名称)
#node.name: node-1   (节点名称)
#path.data: /path/to/data
Path to log files:
path.logs: /path/to/logs
#network.host: 0.0.0.0  (全网)
Set a custom port for HTTP:
http.port: 9200

jvm.optiom es的JVM相关的配置信息

## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms1g
-Xmx1g

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

# explicitly set the stack size
-Xss1m

# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=data

# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=logs/hs_err_pid%p.log

# ----------------------------------- Memory -----------------------------------
elasticsearch-migrate       x-pack-env
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

***注:ES对硬盘要求极高(一般SSD、内存大) ***

后台启动ES: elasticsearch -d
去Web UI查看: hadoop000:9200 (Web UI port) hadoop000:9300 服务器端口(server port) Lucene_version: "7.6.0" 不对应

技巧:用chrome中的json字符串美化插件 (JSON Formatter)

ES核心概念

Cluster  
Node  
Index       Database   
Type        Table  
Document    Row
Field       Column   //这四种的REST API在工作中用的最多
shard(分片--分区集叫分片)
replica(副本)

对应关系:
Index -> Type -> Document -> Field
Database -> Table -> Row -> Column

代码开发

需要: IDEA+Maven+Java pom.xml

      <!--添加elasticsearch依赖-->
    <dependency>
      <groupId>org.elasticsearch.client</groupId>
      <artifactId>transport</artifactId>
      <version>6.6.2</version>
    </dependency>

    <!--添加junit依赖(默认已有)-->
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <!--<scope>test</scope>(去掉这块),需要重新导入依赖-->
    </dependency>

PreBuiltTransportClient.java

     /**
     * Creates a new transport client with pre-installed plugins.
     *
     * @param settings the settings passed to this transport client
     * @param plugins  an optional array of additional plugins to run with this client
     */
    @SafeVarargs
    public PreBuiltTransportClient(Settings settings, Class<? extends Plugin>... plugins) {
        this(settings, Arrays.asList(plugins));
    }
    //Class<? extends Plugin>...
    // ...代表一个/多个可变参数, 可填可不填

TransportClient.java

/**
     * Adds a transport address that will be used to connect to.
     * <p>
     * The Node this transport address represents will be used if its possible to connect to it.
     * If it is unavailable, it will be automatically connected to once it is up.
     * <p>
     * In order to get the list of all the current connected nodes, please see {@link #connectedNodes()}.
     */
    public TransportClient addTransportAddress(TransportAddress transportAddress) {
        nodesService.addTransportAddresses(transportAddress);
        return this;
    }