Consumer Client Re-Design (翻译)

注：0.9版本Kafka的一个重大改变就是consumer和producer API的重新设计。

这篇Kafka的文档大致介绍了对于consumer API重新设计时想要实现的功能。0.9版本的确实现了这些功能，具体细节有几篇文档讲了，以后会翻译。

Motivation

We've received quite a lot of feedback on the consumer side features over the past few months. Some of them are improvements to the current consumer design and some are simply new feature/API requests. I have attempted to write up the requirements that I've heard on this wiki - Kafka 0.9 Consumer Rewrite Design
This would involve some significant changes to the consumer APIs, so we would like to collect feedback on the proposal from our community. Since the list of changes is not small, we would like to understand if some features are preferred over others, and more importantly, if some features are not required at all.

Thin consumer client:

We have a lot of users who have expressed interest in using and writing non-java clients. Currently, this is pretty straightfoward for the SimpleConsumer but not for the high level consumer. The high level consumer does some complex failure detection and rebalancing, which is non-trivial to re-implement correctly.
The goal is to have a very thin consumer client, with minimum dependencies to make this easy for the users.

Central co-ordination :

The current version of the high level consumer suffers from herd and split brain problems, where multiple consumers in a group run a distributed algorithm to agree on the same partition ownership decision. Due to different view of the zookeeper data, they run into conflicts that makes the rebalancing attempt fail. But there is no way for a consumer to verify if a rebalancing operation completed successfully on the entire group. This also leads to some potential bugs in the rebalancing logic, for example,https://issues.apache.org/jira/browse/KAFKA-242
This can be mediated by moving the failure detection and rebalancing logic to a centralized highly-available co-ordinator - Kafka 0.9 Consumer Rewrite Design

We think the first two requirements are the prerequisite of the rest. So we have proceeded by trying to design a centralized coordinator for consumer rebalancing without Zookeeper, for details please read here.

Allow manual partition assignment

There are a number of stateful data systems would like to manually assign partitions to consumers. The main motive is to enable them to keep some local per-partition state since the mapping from their consumer to partition never changes; also there are some use cases where it makes sense to co-locate brokers and consumer processes, hence would be nice to optimize the automatic partition assignment algorithm to consider co-location. Examples of such systems are databases, search indexers etc
A side effect of this requirement is wanting to turn off automatic rebalancing in the high level consumer.
This feature depends on the central co-ordination feature since it is cannot be correctly and easily implemented with the current distributed co-ordination model.

Allow manual offset management

Some systems require offset management in a custom database, at specific intervals. Overall, the requirement is to have access to the message metadata like topic, partition, offset of the message, and to be able to provide per-partition offsets on consumer startup.
This would require designing new consumer APIs that allow providing offsets on startup and return message metadata with the consumer iterator.
One thing that needs to be thought through is if the consumer client can be allowed to pick manual offset management for some, but not all topics. One option is to allow the consumer to pick one offset management only. This could potentially make the API a bit simpler
This feature depends on the central co-ordination feature since it is cannot be correctly and easily implemented with the current distributed co-ordination model.

Invocation of user specified callback on rebalance

Some applications maintain transient per-partition state in-memory. On rebalance operation, they would need to “flush” the transient state to some persistent storage.
The requirement is to let the user plugin some sort of callback that the high level consumer invokes when a rebalance operation is triggered.
This requirement has some overlap with the manual partition assignment requirement. Probably, if we allow manual partition assignment, such applications might be able to leverage that to flush transient state. But, the issue is that these applications do want automatic rebalancing and might not want to use the manual partition assignment feature.

Non blocking consumer APIs

This requirement is coming from stream processing applications that implement high-level stream processing primitives like filter by, group by, join operations on kafka streams.
To facilitate stream join operations, it is desirable that Kafka provides non-blocking consumer APIs. Today, since the consumer streams are essentially blocking, these sort of stream join operations are not possible.
This requirement seems to involve some significant redesign of the consumer APIs and the consumer stream logic. So it will be good to give this some more thought.

动机

在过去的几个月内，我们收到了很多关于消费者端的特性的反馈。其中有一些是对于当前的consumer设计的改进，有一些是对新的特性/API的需求。我尝试把获取的需求写到一起到这个wiki上- Kafka 0.9 Consumer Rewrite Design

这将会关系到一些对于consumer API的显著的改变，所以我们需要从社区中收集对这些提议的反馈。因为改变列表并不上，我们需要了解是否有些特性比其它的更受欢迎，更重要的，有哪些特性是根本不需要的。

Central co-ordination:

（叫contral co-ordiantion是为与当前的使用zk做的distributed co-cordination相比较）

１. high level consumer容易受到herd以及split brain problem的影响（在consumer运行分布式算法来决定partition的归属时）．并且，high level consumer在rebalance的过程中容易产生冲突，从而使得rebalance失败．但是consumer group里的单个consumer无法知道一个rebalance操作是否在整个group级别上成功执行．这也造成了在rebalance逻辑上存在一些潜在的bug，比如https://issues.apache.org/jira/browse/KAFKA-242

2. 所以需要把failure detection和rebalance的功能移到一个centralized high-available co-coordinator上 Kafka 0.9 Consumer Rewrite Design

这两个问题的解决是处理下面几个问题的前提，所以需要设计一个不再用Zookeeper来做consumer rebalance的中央协调系统．

允许手动地分配partition给consumer

１．在有些情况下，用户想要手动地分配partition给consumer．主要原因是，在有些情况下，consumer和partition的对应关系是不会变的，这样就能允许维护一个local per-partition state；有些情况下，用户也需要把broker和conumser　process放在一起，这样就需要使得automatic partition assignment算法考虑co-location的事．这样的系统包括数据库，search indexeers等　．

２．要想实现这个功能，附加着就需要可以关闭high level consumer的 automatic rebalance功能

３. 这个功能依赖于contral co-cordination功能，因为在当前的分布式协调模型中，实现这个功能不容易．

允许手动的offset管理

有些系统需要使用自己的数据库管理offset，按照指定的间隔．总的来说，这就需要用户能获取到消息的metadata，比如topic, partition ,offset, 并且可以在consumer启动的时候为每个分区指定offset．
这就需要设计新的consumer API，使它能够在启动时设置offset，在consumer iterator中返回消息的metadata.
需要仔细考虑的情况一个情况是是否允许一个conumser对一些topic使用手动的offset管理，对另外一些topic使用自动的offset管理．可以考虑的一个选择时，只允许consumer选择一种offset管理方式，这样会使得它的API简单一些．
这个特性依赖于contral co-cordiante特性，因为当前的分布式协调模型不能正确地以及容易地实现这个特性．

允许在rebalance时指定回调函数

有些程序在内存中保存于临时的per-partition状态．在rebalance操作发生时，这些程序需要把transient state刷到持久性存储中．
这就需要可以让用户指定一个回调，当rebalance操作触发时，就会执行这个回调．
这个需求和手动partition分配的需求有所重复．可能会，如果我们允行手动分配partition，程序或许可以借助那个功能来flush traisient state．但是，问题在于，有些程序的确想使用自动的rebalancing,　而不想用手动分配partition的特性．

提供非阻塞的consumer API

这个需求来自于流处理程序，这些程序想要在Kafka流上实现高层流处理的基本操作，比如：filter by, group by, join．
为了使流的join操作更容易，需要Kafka提供非阻塞的consumer API．现在，因为consumer stream本质是阻塞的，所以实现这种stream join操作是不可能的．
这个需要看起来需要对consumer API以及consumer stream logic进行可观的重新设计．所以多花些时间考虑是有益的．

总结：

新的consumer API总的来说

1. 在分布式协调方面更健壮

2. 给用户提供了更多关于消息的元数据

3. 用户可以自己管理offset和partition分配

4. 提供了非阻塞的API

5. 为了做到这些，它使用了contral co-ordinate系统替代了之前的分布式协调系统。

这样，这个新的consumer API就既可以完全替代之前的high level consumer，又提供了以前只有simple API才能提供的一些功能(但是却隐藏了一些直接使用simple API的复杂性)，所以会是一个更通用的API。