Kafka深入理解-3:Kafka如何删除数据(日志)文件

Kafka作为消息中间件,数据需要按照一定的规则删除,否则数据量太大会把集群存储空间占满。

参考:apache Kafka是如何实现删除数据文件(日志)的

Kafka删除数据有两种方式

  • 按照时间,超过一段时间后删除过期消息
  • 按照消息大小,消息数量超过一定大小后删除最旧的数据

Kafka删除数据的最小单位:segment

Kafka删除数据主逻辑:kafka源码

def cleanupLogs() {
    debug("Beginning log cleanup...")
    var total = 0
    val startMs = time.milliseconds
    for(log <- allLogs; if !log.config.compact) {
      debug("Garbage collecting '" + log.name + "'")
      total += cleanupExpiredSegments(log) + cleanupSegmentsToMaintainSize(log)
    }
    debug("Log cleanup completed. " + total + " files deleted in " +
                  (time.milliseconds - startMs) / 1000 + " seconds")
  }

Kafka一段时间(配置文件设置)调用一次 cleanupLogs,删除所有应该删除的日志数据。

cleanupExpiredSegments 负责清理超时的数据

private def cleanupExpiredSegments(log: Log): Int = {
    val startMs = time.milliseconds
    log.deleteOldSegments(startMs - _.lastModified > log.config.retentionMs)
  }

cleanupSegmentsToMaintainSize 负责清理超过大小的数据


private def cleanupSegmentsToMaintainSize(log: Log): Int = { if(log.config.retentionSize < 0 || log.size < log.config.retentionSize) return 0 var diff = log.size - log.config.retentionSize def shouldDelete(segment: LogSegment) = { if(diff - segment.size >= 0) { diff -= segment.size true } else { false } } log.deleteOldSegments(shouldDelete) }
原文地址:https://www.cnblogs.com/chen-kh/p/6095538.html