Mark Mindenhall created SAMZA-1044:
--------------------------------------

             Summary: Checkpointing requires log.cleaner.enable=true
                 Key: SAMZA-1044
                 URL: https://issues.apache.org/jira/browse/SAMZA-1044
             Project: Samza
          Issue Type: Bug
          Components: docs
         Environment: linux
            Reporter: Mark Mindenhall
            Priority: Minor


We're running Samza 0.9.1 with kafka 0.8.2.1, which has a default setting of 
{{log.cleaner.enable=false}}.  We didn't think we needed to enable this, as we 
never created any topics with {{cleanup.policy=compact}}.  However, this 
morning we had a disk alert, and when I took a look on the broker that 
triggered the alert, one of the Samza checkpoint topics was consuming 29GB 
within the {{/logs}} folder.

Long story short, I eventually figured out that all of the checkpoint topics 
were created with {{cleanup.policy=compact}}, and were growing unbounded.  I 
set {{log.cleaner.enable=true}} on each broker, and restarted them.  Within 
minutes, the 29GB was reduced to a 200-300KB.

I thought I must have missed this when I created our jobs with checkpointing 
enabled, so I went and scoured the docs.  There's no mention of the 
{{log.cleaner.enable}} setting within the documentation (unless I missed it 
_again_).

I should add that we've been running most of these jobs for about a year, and I 
noticed that each time we would deploy, it would take longer and longer to 
transition from {{ACCEPTED}} to {{RUNNING}} in the YARN cluster.  Eventually, 
it was taking 10-15 minutes per job, and we didn't understand why.  After 
bouncing our staging cluster with {{log.cleaner.enable=true}} (and letting the 
log cleaner finish its work), I redeployed one of our jobs, and it once again 
took 15-20 seconds from {{ACCEPTED}} to {{RUNNING}}.

Please mention in the documentation that {{log.cleaner.enable}} must be set to 
{{true}} for checkpointing to work correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to