Flink开发_Flink的概念理解

1.Model level

###1. DataStream API  
    use Data Source
     environment.fromSource(
         Source<OUT, ?, ?> source,
         WatermarkStrategy<OUT> timestampsAndWatermarks,
         String sourceName)
	 StreamExecutionEnvironment.addSource(sourceFunction).

###2.DataSet API
       DataSet Transformations

###3.Table API & SQL
    使用Java开发 依赖
	   flink-table-common
	   flink-table-api-java-bridge
	   flink-table-planner-blink
	   flink-table-runtime-blink
	 引入：
	    org.apache.flink.table.api.Table;
        org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
		org.apache.flink.table.api.bridge.java.BatchTableEnvironment
	 Flink 1.11 引入了新的 Table Source 和 Sink 接口（即  DynamicTableSource 和 DynamicTableSink ），
		    org.apache.flink.table.connector.source
			org.apache.flink.table.connector.sink
		  这一接口可以统一批作业和流作业

2.Data Types

  Supported Data Types
  Type handling
  Creating a TypeInformation or TypeSerializer

 Data Types in the Table API
    org.apache.flink.table.types.DataType within the Table API 
	   or when defining connectors, catalogs, 
	   or user-defined functions.

3.Connector

从数据讲，有三类connector
  DataStream Connectors
  DataSet Connectors
  Table & SQL Connectors
作用：
   01.DataStream Connectors
     Predefined Sources and Sinks
	 Bundled Connectors
	 Connectors in Apache Bahir
	Other Ways to Connect to Flink
	  Data Enrichment via Async I/O
	  Queryable State
   02.DataSet Connectors
       file systems
	    other systems using Input/OutputFormat wrappers for Hadoop
   03.Table & SQL Connectors ： register table sources and table sinks
   Flink’s table connectors.
	     User-defined Sources & Sinks  ==  develop a custom, user-defined connector.
         Metadata  Planning  Runtime
		  实现：
		     Dynamic Table Source    Dynamic Table Factories
			 Dynamic Table Sink     Encoding / Decoding Formats
 Predefined Sources and Sinks
   1.pre-defined source connectors
          自定义的Source   SourceOperators 
   flink-core
       org.apache.flink.api.connector.source.SourceSplit
       org.apache.flink.api.connector.source.SourceReader
       org.apache.flink.api.connector.source.SplitEnumerator
       org.apache.flink.api.connector.source.event.NoMoreSplitsEvent
       自定义一个新的 数据源或者理解Fink的数据源的原理
  Sources and sinks are often summarized under the term connector.

4.Refactor Source Interface

. Data Source API

  Flink提供的Source - Data Source API
 01. A Data Source has three core components: 
	    Splits , the SplitEnumerator, and the SourceReader.
         在有界或者批处理的情况下，
	     the enumerator generates a fix set of splits, and each split is necessarily finite. 
		 读取完成后，会返回 NoMoreSplits ，即 有限的splits，且每一个 split是有界的
	  在无界的流处理情况下
	      one of the two is not true (splits are not finite, or the enumerator keep generating new splits).
	例如：
	  Bounded File Source
	  Unbounded Streaming File Source
	    SplitEnumerator 不对 NoMoreSplits做回应，且周期的查看内容 
 02.The Source API is  工厂模式的接口来创建以下组件
     Split Serializer
        Split Enumerator
        Enumerator Checkpoint Serializer
        Source Reader                  消费来自Split的消息
03.
  The SplitReader is the high-level API for simple synchronous reading/polling-based source implementations,
   SourceReaderBase
   SplitFetcherManager
   数据源的Event Time and Watermarks ，不要使用老的函数了，因为数据源已经assigned

2. Data Source Function

  01.预定义的 Source 和 Sink，
       (内置在 Flink 里 直接使用，一般用于调试验证等，不需要引入外部依赖)
    pre-implemented source functions,
    File-based
    Socket-based
    Collection-based
   02.Connectors provide code for interfacing with various third-party systems
    连接器可以和多种多样的第三方系统进行交互
    001.Flink里已经提供了一些绑定的Connector(需要 将相应的connetor相关类打包进)
       public abstract class KafkaDynamicSinkBase implements DynamicTableSink 
	   public interface ScanTableSource extends DynamicTableSource 
	   org.apache.flink.table.connector.sink.DynamicTableSink
	   org.apache.flink.table.connector.source.DynamicTableSource
    002.Apache Bahir中的连接器
   03.Flink 提供了异步 I/O API  连接Fink,一般用于访问外部数据库
   异步I/O可以并发处理多个请求，提高吞吐，减少延迟

04.可查询状态