I have a node in a cluster whose FlowFile repository grew so fast that it 
exceeded the amount of available heap space and now can't checkpoint. Or that 
is my interpretation of the error.

"Cannot update journal file flowfile_repository/journals/####.journal because 
this journal  has already encountered a failure when attempting to write to the 
file."
Additionally, on restart, we see NiFi failed to restart because it ran out of 
heap space while doing a SchemaRecordReader.readFieldValue.  Feeling a bit 
stuck on where to go from here.

Based on metrics we collect, we see a large increase in FlowFile's on that node 
right before it crashed, and in linux we see the following:
94G     ./journals/overflow-569618072
356G    ./journals/overflow-569892338

Oh, and a 280 GB checkpoint file

There are a few queues/known FlowFile's that are probably the problem, and I'm 
OK with dropping them, but there is plenty of other data in there too that I 
don't want to lose...

Thanks,
  Peter

Reply via email to