kafka Disks and Filesystem(磁盘和文件系统)

转载请注明来源地址:http://www.cnblogs.com/dongxiao-yang/p/5206631.html

We recommend using multiple drives to get good throughput and not sharing the same drives used for Kafka data with application logs or other OS filesystem activity to ensure good latency. You can either RAID these drives together into a single volume or format and mount each drive as its own directory. Since Kafka has replication the redundancy provided by RAID can also be provided at the application level. This choice has several tradeoffs.

我们推荐服务器使用多块硬盘:(1)实现高吞吐 (2)隔离kafka数据文件与应用的日志文件以及其他系统相关的磁盘消耗以保证低延迟。多块硬盘可以raid成一个卷或者每块硬盘单独显示一个盘符挂载。由于kakfa在应用级别已经可以提供raid所提供的数据冗余备份的功能,可以通过几个方面权衡选择的策略。

If you configure multiple data directories partitions will be assigned round-robin to data directories. Each partition will be entirely in one of the data directories. If data is not well balanced among partitions this can lead to load imbalance between disks.

如果配置为多块硬盘,分区将会轮询分布到硬盘文件下,每个分区将会完全落到一块单独磁盘上。如果数据里的分区并不是均匀分布的话会可能导致磁盘之间的负载不均衡。

RAID can potentially do better at balancing load between disks (although it doesn't always seem to) because it balances load at a lower level. The primary downside of RAID is that it is usually a big performance hit for write throughput and reduces the available disk space.

raid先天性的在硬盘间数据均衡上表现的更好(虽然并不总是如此),因为raid是在更底层的层面实现的数据均衡。但其主要缺点是raid通常在写吞吐上会有很高的消耗,并且会减少可用的磁盘空间。

Another potential benefit of RAID is the ability to tolerate disk failures. However our experience has been that rebuilding the RAID array is so I/O intensive that it effectively disables the server, so this does not provide much real availability improvement.

raid 的另一个潜在的好处是能够容忍磁盘故障。然而,我们的经验是,重建raid队列的动作是一个过于io密集的工作,显著地使服务器工作失能,因此这不提供很多实际的可用性改进。

原文地址:https://www.cnblogs.com/dongxiao-yang/p/5206631.html