Disruptor系列（二）— disruptor使用

本文译自Dirsruptor在github上的wiki中文章：Getting Started

获取Disruptor

Disruptor jar包可以从maven仓库mvnrepository获取，可以将其集成进项目的依赖管理中。

<dependency>
  <groupId>com.lmax</groupId>
  <artifactId>disruptor</artifactId>
  <version>3.4.2</version>
</dependency>

### 编写事件处理生产者和消费者

为了学习Disruptor的使用，这里以非常简单的例子入手：生产者生产单个long型value传递给消费者。这里简化消费者逻辑，只打印消费的value。首先定义携带数据的Event：

public class LongEvent
{
    private long value;

    public void set(long value)
    {
        this.value = value; 
    }
}

为了允许Disruptor能够为我们预分配这些事件，我们需要一个EventFactory用于构造事件：

public class LongEventFactory implements EventFactory<LongEvent>
{
    public LongEvent newInstance()
    {
        return new LongEvent();
    }
}

一旦我们定义了事件，我便再需要创建事件消费者用于消费处理事件。在我们的例子中，我们只需要打印value值到控制台即可：

public class LongEventHandler implements EventHandler<LongEvent>
{
    public void onEvent(LongEvent event, long sequence, boolean endOfBatch)
    {
        System.out.println("Event: " + event);
    }
}

有了事件消费者，我们还需要事件生产者产生事件。为了简单起见，我们假设数据来源于I/O，如：网络或者文件。由于不同版本的Disruptor，提供了不同的方式编写生产者。

随着3.0版本，Disruptor通过将复杂逻辑囊括在RingBuffer中，从而提供了丰富的Lambda-style API帮助开发者构建Producer。因此从3.0之后，更偏爱使用Event Publisher/Event Translator的API发布消息：

public class LongEventProducerWithTranslator
{
    private final RingBuffer<LongEvent> ringBuffer;
    
    public LongEventProducerWithTranslator(RingBuffer<LongEvent> ringBuffer)
    {
        this.ringBuffer = ringBuffer;
    }
    
    private static final EventTranslatorOneArg<LongEvent, ByteBuffer> TRANSLATOR =
        new EventTranslatorOneArg<LongEvent, ByteBuffer>()
        {
            public void translateTo(LongEvent event, long sequence, ByteBuffer bb)
            {
                event.set(bb.getLong(0));
            }
        };

    public void onData(ByteBuffer bb)
    {
        ringBuffer.publishEvent(TRANSLATOR, bb);
    }
}

这种方式的另一个优势在于Translator代码可以被分离在单独的类中，同时也比较容易进行无依赖的单元测试。Disruptor提供了许多不同的接口(EventTranslator, EventTranslatorOneArg, EventTranslatorTwoArg, etc.)，可以通过实现这些接口提供translators。原因是允许转换器被表示为静态类或非捕获lambda作为转换方法的参数通过Ring Buffer上的调用传递给转换器。

另一方式使用3.0版本之前的遗留API构建生产者发布消息，这种方式比较原始：

public class LongEventProducer
{
    private final RingBuffer<LongEvent> ringBuffer;

    public LongEventProducer(RingBuffer<LongEvent> ringBuffer)
    {
        this.ringBuffer = ringBuffer;
    }

    public void onData(ByteBuffer bb)
    {
        long sequence = ringBuffer.next();  // Grab the next sequence
        try
        {
            LongEvent event = ringBuffer.get(sequence); // Get the entry in the Disruptor
                                                        // for the sequence
            event.set(bb.getLong(0));  // Fill with data
        }
        finally
        {
            ringBuffer.publish(sequence);
        }
    }
}

从以上的代码流程编写可以看出，事件的发布比使用一个简单的队列要复杂。这是由于需要对事件预分配导致。对于消息的发布有两个阶段，首先在RingBuffer中声明需要的槽位，然后再发布可用的数据。必须使用try/finally语句块包裹消息的发布。必须现在try块中声明使用RingBuffer的槽位，然后再finally块中发布使用的sequece。如果不这样做，将可能导致Disruptor状态的错误，特别是在多生产者的情况下，如果不重启Disruptor将不能恢复。因此推荐使用EventTranslator编写producer。

最后一步需要将以上编写的组件连接起来。虽然可以手动连接各个组件，然而那样可能比较复杂，因此提供了一个DSL用于构造以便简化过程。使用DSL带来装配的简化，但是却对于很多参数无法做到更细致的控制，然而对于大多数情况，DSL还是非常适合：

public class LongEventMain
{
    public static void main(String[] args) throws Exception
    {
        // The factory for the event
        LongEventFactory factory = new LongEventFactory();

        // Specify the size of the ring buffer, must be power of 2.
        int bufferSize = 1024;

        // Construct the Disruptor
        Disruptor<LongEvent> disruptor = new Disruptor<>(factory, bufferSize, DaemonThreadFactory.INSTANCE);

        // Connect the handler
        disruptor.handleEventsWith(new LongEventHandler());

        // Start the Disruptor, starts all threads running
        disruptor.start();

        // Get the ring buffer from the Disruptor to be used for publishing.
        RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();

        LongEventProducer producer = new LongEventProducer(ringBuffer);

        ByteBuffer bb = ByteBuffer.allocate(8);
        for (long l = 0; true; l++)
        {
            bb.putLong(0, l);
            producer.onData(bb);
            Thread.sleep(1000);
        }
    }
}

关于对Disruptor的接口设计的影响之一是Java 8，因为它使用了Functional Interfaces去实现Java Lambdas。在Disruptor API的大多数接口都被定义成Functional Interfaces以便Lambdas可以被使用。以上的LongEventMain可以使用Lambdas进行简化：

public class LongEventMain
{
    public static void main(String[] args) throws Exception
    {
        // Specify the size of the ring buffer, must be power of 2.
        int bufferSize = 1024;

        // Construct the Disruptor
        Disruptor<LongEvent> disruptor = new Disruptor<>(LongEvent::new, bufferSize, DaemonThreadFactory.INSTANCE);

        // Connect the handler
        disruptor.handleEventsWith((event, sequence, endOfBatch) -> System.out.println("Event: " + event));

        // Start the Disruptor, starts all threads running
        disruptor.start();

        // Get the ring buffer from the Disruptor to be used for publishing.
        RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();

        ByteBuffer bb = ByteBuffer.allocate(8);
        for (long l = 0; true; l++)
        {
            bb.putLong(0, l);
            ringBuffer.publishEvent((event, sequence, buffer) -> event.set(buffer.getLong(0)), bb);
            Thread.sleep(1000);
        }
    }
}

可以看出使用Lambdas有大量的类将不再需要，如：handler，translator等。也可以看出使用Lambdas简化publishEvent()只仅仅涉及到参数传递。

然而如果将代码改成这样：

ByteBuffer bb = ByteBuffer.allocate(8);
for (long l = 0; true; l++)
{
    bb.putLong(0, l);
    ringBuffer.publishEvent((event, sequence) -> event.set(bb.getLong(0)));
    Thread.sleep(1000);
}

注意这里使用了捕获式的Lambda，意味着通过调用publishEvent()时可能需要实例化一个对象来持有ByteBuffer bb将其传递给lambda。这个将可能创建额外的垃圾，如果对GC压力有严格要求的情况下，通过传递参数的方式将更加受欢迎。

使用方法引用来代理上述的lambda将能进一步简化上述的方式，也将更时髦：

public class LongEventMain
{
    public static void handleEvent(LongEvent event, long sequence, boolean endOfBatch)
    {
        System.out.println(event);
    }

    public static void translate(LongEvent event, long sequence, ByteBuffer buffer)
    {
        event.set(buffer.getLong(0));
    }

    public static void main(String[] args) throws Exception
    {
        // Specify the size of the ring buffer, must be power of 2.
        int bufferSize = 1024;

        // Construct the Disruptor
        Disruptor<LongEvent> disruptor = new Disruptor<>(LongEvent::new, bufferSize, DaemonThreadFactory.INSTANCE);

        // Connect the handler
        disruptor.handleEventsWith(LongEventMain::handleEvent);

        // Start the Disruptor, starts all threads running
        disruptor.start();

        // Get the ring buffer from the Disruptor to be used for publishing.
        RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();

        ByteBuffer bb = ByteBuffer.allocate(8);
        for (long l = 0; true; l++)
        {
            bb.putLong(0, l);
            ringBuffer.publishEvent(LongEventMain::translate, bb);
            Thread.sleep(1000);
        }
    }
}

这里对ringBuffer.publishEvent的参数使用了方法引用替换了lambda，使其更进一步简化。

### 基本的参数设置

对于大多数场景使用方式即可。然而，如果你能确定硬件和软件的环境便可以进一步对Disruptor的参数进行调整以提高性能。主要有两种参数可以被调整：

single vs. multiple producers
alternative wait strategies

Single vs. Multiple Producers

提高并发系统的性能的最好方式是遵循Single Writer Principle，这个也在Disruptor也被应用。如果在你的场景中只仅仅是单生产者，然后你可以调优获得额外的性能提升：

public class LongEventMain
{
    public static void main(String[] args) throws Exception
    {
        //.....
        // Construct the Disruptor with a SingleProducerSequencer
        Disruptor<LongEvent> disruptor = new Disruptor(
            factory, bufferSize, ProducerType.SINGLE, new BlockingWaitStrategy(), DaemonThreadFactory.INSTANCE);
        //.....
    }
}

为了说明通过这种技术方式能替身多少性能优势，这里有一份测试类OneToOne performance test。在i7 Sandy Bridge MacBook Air的运行结果：

Multiple Producer:

Run 0, Disruptor=26,553,372 ops/sec
Run 1, Disruptor=28,727,377 ops/sec
Run 2, Disruptor=29,806,259 ops/sec
Run 3, Disruptor=29,717,682 ops/sec
Run 4, Disruptor=28,818,443 ops/sec
Run 5, Disruptor=29,103,608 ops/sec
Run 6, Disruptor=29,239,766 ops/sec

Single Producer:

Run 0, Disruptor=89,365,504 ops/sec
Run 1, Disruptor=77,579,519 ops/sec
Run 2, Disruptor=78,678,206 ops/sec
Run 3, Disruptor=80,840,743 ops/sec
Run 4, Disruptor=81,037,277 ops/sec
Run 5, Disruptor=81,168,831 ops/sec
Run 6, Disruptor=81,699,346 ops/sec

Alternative Wait Strategies

Disruptor默认使用的等待策略是BlockingWaitStrategy。内部的BlockingWaitStrategy使用典型的Lock和Condition处理线程的wake-up。BlockingWaitStrategy是等待策略中最慢的，但是在CPU使用率方面是最保守的，最广泛的适用于大多数场景。可以通过调整等待策略参数获取额外的性能。

1.SleepingWaitStrategy

类似BlockingWaitStrategy，SleepingWaitStrategy也试图保持CPU使用率。通过使用简单的忙等循环，但是在循环过程中调用了LockSupport.parkNanos(1)。在典型的Linux系统上停顿线程60us。然而，它具有以下好处：生产线程不需要采取任何其他增加适当计数器的动作，并且不需要发信号通知条件变量的成本。然而将增大生产者和消费者之前数据传递的延迟。在低延迟没有被要求的场景中，这是一个非常好的策略。一个公共的使用场景是异步日志。

2.YieldingWaitStrategy

YieldingWaitStrategy是一个低延迟系统中等待策略。通过牺牲CPU资源来降低延迟。YieldingWaitStrategy通过busy spin等待sequence增长到合适的值。在内部实现中，通过在循环内部使用Thread.yield()允许其他的队列线程运行。当需要很高的性能且事件处理线程少于CPU逻辑核数时这个策略被强烈推荐。如：启用了超线程。

3.BusySpinWaitStrategy

BusySpinWaitStrategy是高新跟那个的等待策略，但是对环境有限制。如果事件处理器的数量小于物理核数时才使用这个策略。

### 清理RingBuffer中的对象

当通过Disruptor传递数据时，对象的存活时间可能超过预期。为了能够避免这个发生，在事件处理结束后应当清理下事件对象。如果只有单个生产者，在该生产者中清理对象即是最高效的。然后有时间处理链时，就需要特定的事件处理器被放置在链的最末尾用于清理事件。

class ObjectEvent<T>
{
    T val;

    void clear()
    {
        val = null;
    }
}

public class ClearingEventHandler<T> implements EventHandler<ObjectEvent<T>>
{
    public void onEvent(ObjectEvent<T> event, long sequence, boolean endOfBatch)
    {
        // Failing to call clear here will result in the 
        // object associated with the event to live until
        // it is overwritten once the ring buffer has wrapped
        // around to the beginning.
        event.clear(); 
    }
}

public static void main(String[] args)
{
    Disruptor<ObjectEvent<String>> disruptor = new Disruptor<>(
        () -> ObjectEvent<String>(), bufferSize, DaemonThreadFactory.INSTANCE);

    disruptor
        .handleEventsWith(new ProcessingEventHandler())
        .then(new ClearingObjectHandler());
}