Hi, I'm new to Kafka and having trouble with log compaction. I'm attempting to set up topics that will aggressively compact, but so far I'm having trouble getting complete compaction at all. The topic is configured like so:
Topic:beer_archive PartitionCount:20 ReplicationFactor:1 Configs:min.cleanable.dirty.ratio=0.01,delete.retention.ms=60000,segment.ms =1800000,cleanup.policy=compact The dirty ratio and segment.ms have been changed after duplicated records have shown up in an attempt to get compaction to work. My test for success is a dump of keys, comparing the total count to the unique count. This list is produced like so: kafka-console-consumer.sh ..... --from-beginning --property print.key=true | cut -f1 > id_file This gives me 535,480 unique keys, and a total of 2,230,784 entries. After making tweaks to the segment.ms to make the last segment eligible for compaction, SOME compaction did occur a couple times. A sample compaction from the log: [2015-06-12 15:51:31,440] INFO Cleaner 0: Beginning cleaning of log beer_archive-5. (kafka.log.LogCleaner) [2015-06-12 15:51:31,441] INFO Cleaner 0: Building offset map for beer_archive-5... (kafka.log.LogCleaner) [2015-06-12 15:51:31,580] INFO Cleaner 0: Building offset map for log beer_archive-5 for 1 segments in offset range [123847, 126857). (kafka.log.LogCleaner) [2015-06-12 15:51:31,583] INFO Cleaner 0: Offset map for log beer_archive-5 complete. (kafka.log.LogCleaner) [2015-06-12 15:51:31,583] INFO Cleaner 0: Cleaning log beer_archive-5 (discarding tombstones prior to Fri Jun 12 14:41:42 UTC 2015)... (kafka.log.LogCleaner) [2015-06-12 15:51:31,583] INFO Cleaner 0: Cleaning segment 0 in log beer_archive-5 (last modified Fri Jun 12 14:42:42 UTC 2015) into 0, retaining deletes. (kafka.log.LogCleaner) [2015-06-12 15:51:32,319] INFO Cleaner 0: Cleaning segment 123847 in log beer_archive-5 (last modified Fri Jun 12 15:26:00 UTC 2015) into 0, retaining deletes. (kafka.log.LogCleaner) [2015-06-12 15:51:35,094] INFO Cleaner 0: Swapping in cleaned segment 0 for segment(s) 0,123847 in log beer_archive-5. (kafka.log.LogCleaner) [2015-06-12 15:51:35,095] INFO [kafka-log-cleaner-thread-0], Log cleaner thread 0 cleaned log beer_archive-5 (dirty section = [123847, 126857]) 116.5 MB of log processed in 3.7 seconds (31.9 MB/sec). Indexed 2.5 MB in 0.1 seconds (17.2 Mb/sec, 3.9% of total time) Buffer utilization: 0.0% Cleaned 116.5 MB in 3.5 seconds (33.2 Mb/sec, 96.1% of total time) Start size: 116.5 MB (111,662 messages) End size: 115.0 MB (109,893 messages) 1.2% size reduction (1.6% fewer messages) (kafka.log.LogCleaner) Any ideas where I'm going wrong? Thanks! Shayne