akka cluster sharding source code 学习 (1/5) 替身模式

为了使一个项目支持集群，自己学习使用了 akka cluster 并在项目中实施了，从此，生活就变得有些痛苦。再配上 apache 做反向代理和负载均衡，debug 起来不要太酸爽。直到现在，我还对 akka cluster 输出的 log 不是很熟悉，目前网络上 akka cluster 的信息还比较少，想深入了解这东西的话，还是要自己读 source code。前几天，雪球那帮人说 akka 不推荐使用，有很多坑，这给我提了个醒，目前我对 akka 的理解是远远不够的，需要深入学习。

akka cluster sharding 是 akka 的一个 extension。12年左右，有人在 google group 中开始讨论dedicated actor for each entity 这个概念，经过很多讨论，最终由 Patrik Nordwall 实现，以 experimental 的形式加入到 akka contri 库里。我本来不知道有这么一个东西，甚至想过自己实现一个这样玩意。我并没有为 cluster sharding 做过 benchmark，也不知道该怎么做，http://dcaoyuan.github.io/papers/rpi_cluster/benchmark.html 做了一个在树莓派上的benchmark，单个节点1000 qps，很像学习下他的 benchmark 的代码。

第一篇，学习下 cluster sharding 中是如何使用替身模式的。首先，什么是替身模式：一个 actor 收到 request 后可能会做一些比较复杂的操作，典型的操作比如，聚集操作。举个例子，primary 节点为了知道各个 replica 节点的状态，他会 ping 所有的 replica，收集他们的反馈，记录他们的存活状态，这种场景下，就比较适合新创建一个 actor，它专门做着一件事。这样做有几个优点，首先，primary actor 可以把这部分逻辑放到其他 actor 中，不会搞乱自己本身的逻辑，其实 actor 仅有一个 receive 函数，case 写的多了会很乱的。其次，把这种事情交给其他 actor，这个 actor 即便因异常重启，也不会对系统有太大影响，重做一遍即可。总之，替身模式，就是指创建一个替身actor来单独做一件事。

在 cluster sharding 中，有两个逻辑使用了替身模式，一个是 stop cluster。

/**
 * INTERNAL API. Sends stopMessage (e.g. `PoisonPill`) to the entities and when all of
 * them have terminated it replies with `ShardStopped`.
 */
private[akka] class HandOffStopper(shard: String, replyTo: ActorRef, entities: Set[ActorRef], stopMessage: Any)
  extends Actor {
  import ShardCoordinator.Internal.ShardStopped

  entities.foreach { a ⇒
    context watch a
    a ! stopMessage
  }

  var remaining = entities

  def receive = {
    case Terminated(ref) ⇒
      remaining -= ref
      if (remaining.isEmpty) {
        replyTo ! ShardStopped(shard)
        context stop self
      }
  }
}

第二个用法，也是用来 hande off

/**
 * INTERNAL API. Rebalancing process is performed by this actor.
 * It sends `BeginHandOff` to all `ShardRegion` actors followed by
 * `HandOff` to the `ShardRegion` responsible for the shard.
 * When the handoff is completed it sends [[RebalanceDone]] to its
 * parent `ShardCoordinator`. If the process takes longer than the
 * `handOffTimeout` it also sends [[RebalanceDone]].
 */
private[akka] class RebalanceWorker(shard: String, from: ActorRef, handOffTimeout: FiniteDuration,
                                    regions: Set[ActorRef]) extends Actor {
  import Internal._
  regions.foreach(_ ! BeginHandOff(shard))
  var remaining = regions

  import context.dispatcher
  context.system.scheduler.scheduleOnce(handOffTimeout, self, ReceiveTimeout)

  def receive = {
    case BeginHandOffAck(`shard`) ⇒
      remaining -= sender()
      if (remaining.isEmpty) {
        from ! HandOff(shard)
        context.become(stoppingShard, discardOld = true)
      }
    case ReceiveTimeout ⇒ done(ok = false)
  }

  def stoppingShard: Receive = {
    case ShardStopped(shard) ⇒ done(ok = true)
    case ReceiveTimeout      ⇒ done(ok = false)
  }

  def done(ok: Boolean): Unit = {
    context.parent ! RebalanceDone(shard, ok)
    context.stop(self)
  }
}