Hi, I recently upgraded our cluster to 1.9 from 1.7.1 and I have seen some weird behaviors relating to the flowfile repo.
Since upgrading i have had at-least 3 occurrences of the following error . nifi-app_2019-03-07_15.0.log:Caused by: java.nio.file.FileAlreadyExistsException: /flowfile_repo/nifi-1.9.0/journals/overflow-6297272 nifi-app_2019-03-07_21.0.log:Caused by: java.nio.file.FileAlreadyExistsException: /flowfile_repo/nifi-1.9.0/journals/overflow-8385089 nifi-app_2019-03-08_00.0.log:Caused by: java.nio.file.FileAlreadyExistsException: /flowfile_repo/nifi-1.9.0/journals/overflow-9081799 This seems to result in the total loss of those flow files. The flowfile_repo is located on 1+1 SSD drives. I have however not found any trances of either "To many open files" or "That the SSD went full (out of disk) during the process". The error also happened during a slow time of the day (1000 flowfiles /second) The error starts like this, and then i got 4.5GB of errors in my logfiles (47 logfiles @ 100mb). 2019-03-06 14:33:20,305 ERROR [Timer-Driven Process Thread-47] o.a.n.c.r.StandardProcessSession Failed to commit session org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update. Will roll back. 2019-03-06 14:33:20,306 ERROR [Timer-Driven Process Thread-47] o.a.n.processors.standard.ConvertRecord ConvertRecord[id=32781c92-fab3-34de-b1ff-6138f8788701] Failed to commit session StandardProcessSession[id=244416156] due to org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update; rolling back: org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:413) at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:343) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:256) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.file.FileAlreadyExistsException: /flowfile_repo/nifi-1.9.0/journals/overflow-874061 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384) at java.nio.file.Files.createDirectory(Files.java:674) at org.apache.nifi.wali.LengthDelimitedJournal.update(LengthDelimitedJournal.java:249) at org.apache.nifi.wali.SequentialAccessWriteAheadLog.update(SequentialAccessWriteAheadLog.java:123) at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:309) at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:259) at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:407) ... 10 common frames omitted 2019-03-06 14:33:20,337 ERROR [Timer-Driven Process Thread-53] o.a.n.c.r.StandardProcessSession Failed to commit session org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update. Will roll back. 2019-03-06 14:33:20,337 ERROR [Timer-Driven Process Thread-53] o.a.n.processors.standard.ConvertRecord ConvertRecord[id=32781c92-fab3-34de-b1ff-6138f8788701] Failed to commit session StandardProcessSession[id=244415981] due to org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update; rolling back: org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:413) at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:343) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:256) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Stream Closed at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at org.apache.nifi.wali.LengthDelimitedJournal.update(LengthDelimitedJournal.java:308) at org.apache.nifi.wali.SequentialAccessWriteAheadLog.update(SequentialAccessWriteAheadLog.java:123) at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:309) at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:259) at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:407) ... 10 common frames omitted .......MANY errors After 26 rolls of the logfiles the checkpoint arrives 2019-03-06 14:34:53,671 INFO [pool-13-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 22744 records in 339 milliseconds And then a new repository file fails 2019-03-06 14:34:54,287 ERROR [Timer-Driven Process Thread-93] o.a.n.processors.standard.ConvertRecord ConvertRecord[id=32781c92-fab3-34de-b1ff-6138f8788701] Failed to commit session StandardProcessSession[id=246475219] due to org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update; rolling back: org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:413) at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:343) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:256) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.file.FileAlreadyExistsException: /flowfile_repo/nifi-1.9.0/journals/overflow-876686 The following also start showing up after a while. 2019-03-06 14:33:22,363 ERROR [Timer-Driven Process Thread-19] o.a.n.c.r.c.StandardResourceClaimManager Decremented claimant count for StandardResourceClaim[id=1551879190172-8371, container=default, section=179] to -3333 2019-03-06 14:33:22,363 WARN [Timer-Driven Process Thread-53] o.a.n.c.r.c.StandardResourceClaimManager Decrementing claimant count for StandardResourceClaim[id=1551879161479-8345, container=default, section=153] but claimant count is not known. Returning -1 2019-03-06 14:33:22,363 WARN [Timer-Driven Process Thread-47] o.a.n.c.r.c.StandardResourceClaimManager Decrementing claimant count for StandardResourceClaim[id=1551879161479-8346, container=default, section=154] but claimant count is not known. Returning -1 2019-03-06 14:33:22,363 WARN [Timer-Driven Process Thread-47] o.a.n.c.r.c.StandardResourceClaimManager Decrementing claimant count for StandardResourceClaim[id=1551879161479-8345, container=default, section=153] but claimant count is not known. Returning -1 2019-03-06 14:33:22,363 ERROR [Timer-Driven Process Thread-94] o.a.n.c.r.c.StandardResourceClaimManager Decremented claimant count for StandardResourceClaim[id=1551879190172-8371, container=default, section=179] to -3334 Any takes on what I could look into more for details? I have rolled back the cluster to 1.7.1 as of now, but I still have 1.9 on my test cluster so I will try to reproduce it in there. Best regards Viking
