StructuredStream StateStore机制

ref: https://jaceklaskowski.gitbooks.io/spark-structured-streaming/
StruncturedStream的statefule实现基于StateStore，能够记忆历史的结果，从而形成unbounded流式计算。其内部实际上是将历史的统计结果存在StateStore(目前是基于HDFS存储数据)。每次计算时，会执行StateStoreRestore->Agg->StateStoreSave:

stateful机制以来与StateStoreRDD

logical plan逻辑：

StateStoreRestore/Save都是基于StateStoreRDD

StateStoreRDD基于StateStoreCoordinator获取state的location，作为preferred location.
数据来源包含StateStore的历史结果和新batch的RDD数据。

StateStoreRDD is an RDD for executing storeUpdateFunction with StateStore (and data from partitions of a new batch RDD).

最终StateStoreRDD将merge历史的state和新的batch data：

// StateStoreRDD#compute
override def compute(partition: Partition, ctxt: TaskContext): Iterator[U] = {
    var store: StateStore = null
    val storeId = StateStoreId(checkpointLocation, operatorId, partition.index)
    store = StateStore.get(
      storeId, keySchema, valueSchema, storeVersion, storeConf, confBroadcast.value.value) // 获取Store
    val inputIter = dataRDD.iterator(partition, ctxt)  // 新batch的数据
    storeUpdateFunction(store, inputIter)  // 结合计算，Restore和Save的逻辑不同
  }

storeUpdateFunction of StateStoreRestore

Restore时的merge逻辑是将历史state和新batch的数据，按相同的key合并在一起，主要调用store#get(key)

{ case (store, iter) =>
        val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output)
        iter.flatMap { row =>
          val key = getKey(row)
          val savedState = store.get(key)
          numOutputRows += 1
          row +: savedState.toSeq
        }

storeUpdateFunction of StateStoreSave (以outMode=complete为例),主要调用 `store#put(key,value)`

{ (store, iter) =>
        val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output)
        ...
        outputMode match {
          // Update and output all rows in the StateStore.
          case Some(Complete) =>
            while (iter.hasNext) {
              val row = iter.next().asInstanceOf[UnsafeRow]
              val key = getKey(row)
              store.put(key.copy(), row.copy())
              numUpdatedStateRows += 1
            }
            store.commit()
            numTotalStateRows += store.numKeys()
            store.iterator().map { case (k, v) =>
              numOutputRows += 1
              v.asInstanceOf[InternalRow]
            }
...

StateStore (HDFSBackedStateStore)

简单理解一下StateStore。直观上，在DStream框架下如果要实现stateful，我们也会把历史的state用一个RDD存下来，每次新的数据计算完成后再跟历史RDD融合（通过checkpoint避免超长lineage）。这个思路是完全正确并且和StructuredStream的思路相似。

key/value的schema
preferred location优化

StateStoreRDD是逻辑上的RDD，因为它的数据实际上来源于history+new batch。

它的partition是new batch的partition。

override protected def getPartitions: Array[Partition] = dataRDD.partitions

preferredLocation选择
p1 -> 计算其对应的历史state store的storeId->从storeCoor获取该storeId的location。（注：可有可无）
StoreId 由( checkpointLocation, operationId, partition.index)唯一确定。

override def getPreferredLocations(partition: Partition): Seq[String] = {
    val storeId = StateStoreId(checkpointLocation, operatorId, partition.index)
    storeCoordinator.flatMap(_.getLocation(storeId)).toSeq
  }

compute过程

override def compute(partition: Partition, ctxt: TaskContext): Iterator[U] = {
    var store: StateStore = null
    val storeId = StateStoreId(checkpointLocation, operatorId, partition.index)
    store = StateStore.get(
      storeId, keySchema, valueSchema, storeVersion, storeConf, confBroadcast.value.value)
    val inputIter = dataRDD.iterator(partition, ctxt)
    storeUpdateFunction(store, inputIter)
  }

※ 根据storeId，key/valueSchema, version等信息获取store (StateStore#get）

  def get(
      storeId: StateStoreId,
      keySchema: StructType,
      valueSchema: StructType,
      version: Long,
      storeConf: StateStoreConf,
      hadoopConf: Configuration): StateStore = {
    require(version >= 0)
    val storeProvider = loadedProviders.synchronized {
      startMaintenanceIfNeeded()
      val provider = loadedProviders.getOrElseUpdate(
        storeId,
        new HDFSBackedStateStoreProvider(storeId, keySchema, valueSchema, storeConf, hadoopConf))
      reportActiveStoreInstance(storeId)
      provider
    }
    storeProvider.getStore(version)
  }

→ storeProvider.getStore(version)
基于type MapType = java.util.concurrent.ConcurrentHashMap[UnsafeRow, UnsafeRow]
loadMap从HDFS中将数据读入到Map中。

  override def getStore(version: Long): StateStore = synchronized {
    require(version >= 0, "Version cannot be less than 0")
    val newMap = new MapType()
    if (version > 0) {
      newMap.putAll(loadMap(version))
    }
    val store = new HDFSBackedStateStore(version, newMap)
    logInfo(s"Retrieved version $version of ${HDFSBackedStateStoreProvider.this} for update")
    store
  }