Hello All, We recently upgraded to kafka_2.10-0.10.1.1. We have a source topic with replication = 3 and partition = 40. We have a streams application run with NUM_STREAM_THREADS_CONFIG = 4 and on three machines. So 12 threads in total.
What we do is start the same streams application one by one on three machines. After some time what we noticed was that one of the machine streams application just crashed. When we inspected the log here is what we found. There were 2 sets of errors like: stream-thread [StreamThread-3] Failed to commit StreamTask 0_38 state: org.apache.kafka.streams.errors.ProcessorStateException: task [0_38] Failed to flush state store key-table at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:331) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:267) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:576) [kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:562) [kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:538) [kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:456) [kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:242) [kafka-streams-0.10.1.1.jar:na] Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error while executing flush from store key-table-201702052100 at org.apache.kafka.streams.state.internals.RocksDBStore.flushInternal(RocksDBStore.java:375) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBStore.flush(RocksDBStore.java:366) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBWindowStore.flush(RocksDBWindowStore.java:256) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.MeteredWindowStore.flush(MeteredWindowStore.java:116) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.CachingWindowStore.flush(CachingWindowStore.java:119) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:329) ~[kafka-streams-0.10.1.1.jar:na] ... 6 common frames omitted Caused by: org.rocksdb.RocksDBException: IO error: /data/kafka-streams/new-part/0_38/key-table/key-table-201702052100/000009.sst: No such file or directory at org.rocksdb.RocksDB.flush(Native Method) ~[rocksdbjni-4.9.0.jar:na] at org.rocksdb.RocksDB.flush(RocksDB.java:1360) ~[rocksdbjni-4.9.0.jar:na] at org.apache.kafka.streams.state.internals.RocksDBStore.flushInternal(RocksDBStore.java:373) ~[kafka-streams-0.10.1.1.jar:na] ... 11 common frames omitted When we queried the directory /data/kafka-streams/new-part/0_38/key-table/key-table-201702052100/ we could not find the directory. So looks like this directory was earlier deleted by some task and now some other task is trying to flush it too. What could be possible the reason for the same? Another set of error we see is this: org.apache.kafka.streams.errors.StreamsException: stream-thread [StreamThread-1] Failed to rebalance at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:410) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:242) ~[kafka-streams-0.10.1.1.jar:na] Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error opening store key-table-201702052000 at location /data/kafka-streams/new-part/0_38/key-table/key-table-201702052000 at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:190) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:159) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBWindowStore$Segment.openDB(RocksDBWindowStore.java:72) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBWindowStore.getOrCreateSegment(RocksDBWindowStore.java:388) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBWindowStore.putInternal(RocksDBWindowStore.java:319) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBWindowStore.access$000(RocksDBWindowStore.java:51) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBWindowStore$1.restore(RocksDBWindowStore.java:206) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorStateManager.restoreActiveState(ProcessorStateManager.java:235) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorStateManager.register(ProcessorStateManager.java:198) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.register(ProcessorContextImpl.java:123) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.RocksDBWindowStore.init(RocksDBWindowStore.java:200) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.MeteredWindowStore.init(MeteredWindowStore.java:66) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.state.internals.CachingWindowStore.init(CachingWindowStore.java:64) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.AbstractTask.initializeStateStores(AbstractTask.java:81) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:119) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:633) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:660) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread.access$100(StreamThread.java:69) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:124) ~[kafka-streams-0.10.1.1.jar:na] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228) ~[kafka-clients-0.10.1.1.jar:na] at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313) ~[kafka-clients-0.10.1.1.jar:na] at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277) ~[kafka-clients-0.10.1.1.jar:na] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259) ~[kafka-clients-0.10.1.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013) ~[kafka-clients-0.10.1.1.jar:na] at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979) ~[kafka-clients-0.10.1.1.jar:na] at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:407) ~[kafka-streams-0.10.1.1.jar:na] ... 1 common frames omitted Caused by: org.rocksdb.RocksDBException: IO error: lock /data/kafka-streams/new-part/0_38/key-table/key-table-201702052000/LOCK: No locks available at org.rocksdb.RocksDB.open(Native Method) ~[rocksdbjni-4.9.0.jar:na] at org.rocksdb.RocksDB.open(RocksDB.java:184) ~[rocksdbjni-4.9.0.jar:na] at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:183) ~[kafka-streams-0.10.1.1.jar:na] ... 26 common frames omitted And then the whole application shuts down. We have not understood this part of the error and if this second error is somehow related to first error. Please let us know what could be the cause of these errors or if they have been fixed and if there is some way to fix this in current release. Thanks Sachin