Hi, We are getting some deadlock state after re-balance and what we found that it happens after re-balance the partition is assigned to a different thread in same instance.
Looks like previous thread is not releasing the rocks db lock. Here are the complete logs spanned across multiple files. https://drive.google.com/open?id=0B-KsMy4P1bGxLTVVVWlZOGdQZTUzYm9aV3lfOFdubTJ2X2JR I am just posting the gist here as how we understood this issue. *===== TestKafkaAdvice.2017-03-01.89.log ========* 53352 INFO 2017-03-01 18:11:56,980 [StreamThread-3]: org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully joined group new-part-advice with *generation 96* 53353 INFO 2017-03-01 18:11:56,980 [StreamThread-4]: org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully joined group new-part-advice with *generation 96* 53354 INFO 2017-03-01 18:11:56,980 [StreamThread-1]: org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully joined group new-part-advice with *generation 96* 53361 INFO 2017-03-01 *18:11:57,084* [StreamThread-1]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-1] New partitions [[advice-stream-9, advice-strea**m-6]]* assigned at the end of consumer rebalance. 53362 INFO 2017-03-01 18:11:57,084 [StreamThread-3]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-3] New partitions [[advice-stream-7]]* assigned at the end of consumer rebalance. 53363 DEBUG 2017-03-01 18:11:57,084 [StreamThread-3]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [StreamThread-3] closing suspended non-assigned task 53364 INFO 2017-03-01 18:11:57,084 [StreamThread-4]: org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Setting newly assigned partitions [advice-stream-8, advice-stream-2] fo r group new-part-advice 53365 INFO 2017-03-01 18:11:57,084 [StreamThread-4]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-4] New partitions [[advice-stream-8, advice-strea**m-2]]* assigned at the end of consumer rebalance. 54253 INFO 2017-03-01 18:13:19,146 *[StreamThread-4]*: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-4] Creating active task 0_2 with assigned partiti**ons [[advice-stream-2]]* 56789 INFO 2017-03-01 18:17:42,311 [StreamThread-1]: org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully joined group new-part-advice with *generation 97* 56790 INFO 2017-03-01 18:17:42,311 [StreamThread-3]: org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully joined group new-part-advice with *generation 97* 56796 INFO 2017-03-01 *18:17:42,312* [StreamThread-3]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-3] New partitions [[advice-stream-9, advice-strea**m-6]]* assigned at the end of consumer rebalance. 56805 INFO 2017-03-01 18:17:42,312 [StreamThread-1]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-1] New partitions [[advice-stream-8, advice-strea**m-4]]* assigned at the end of consumer rebalance. 56823 INFO 2017-03-01 18:17:42,465 *[StreamThread-1]*: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-1] Creating active task 0_4 with assigned partiti**ons [[advice-stream-4]]* 56831 DEBUG 2017-03-01 18:17:42,531 *[StreamThread-1]*: org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - Group new-part-advice *fetching committed offsets for partitions: [advi**ce-stream-8, advice-stream-4]* 57769 INFO 2017-03-01 *18:20:16,291* [StreamThread-4]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-4] Creating active task 0_8 with assigned partiti**ons [[advice-stream-8]]* 57935 INFO 2017-03-01 18:20:19,181 [StreamThread-4]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-4] Removing all active tasks [[0_2, 0_8]]* *======== TestKafkaAdvice.2017-03-01.90.log ============* 742 INFO 2017-03-01 *18:22:42,697* [StreamThread-3]: org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully joined group new-part-advice with *generation 98* 743 INFO 2017-03-01 18:22:42,697 [StreamThread-4]: org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully joined group new-part-advice with *generation 98* 746 INFO 2017-03-01 18:22:42,697 [StreamThread-3]: org.apache.kafka.streams. processor.internals.StreamThread - stream-thread *[StreamThread-3] New partitions [[advice-stream-9, advice-stream**-4]]* assigned at the end of consumer rebalance. 748 INFO 2017-03-01 18:22:42,712 [StreamThread-4]: org.apache.kafka.streams. processor.internals.StreamThread - stream-thread *[StreamThread-4] New partitions [[advice-stream-8, advice-stream**-2]]* assigned at the end of consumer rebalance. ............... 4502 WARN 2017-03-01 *18:26:23,323* *[StreamThread-3]*: org.apache.kafka.streams.processor.internals.StreamThread - *Could not create task 0_4*. Will retry. *************************************** Some time later.......* 4444 INFO 2017-03-01 18:25:44,071 [StreamThread-4]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-4] Removing all active tasks [[0_2, 0_8]]* ************************************ *Task 0_2 (from line # 54253) moved to a different instance's thread-3 and Task 0_8 got re-assigned to Thread-4 (line # 4584 below)* 4474 INFO 2017-03-01 18:26:13,443 [StreamThread-1]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [*StreamThread-1] Creating active task 0_8 with assigned partitio**ns [[advice-stream-8]]* 4475 WARN 2017-03-01 18:26:14,445 *[StreamThread-1]*: org.apache.kafka.streams.processor.internals.StreamThread - *Could not create task 0_8*. Will retry. 4581 INFO 2017-03-01 *18:27:43,048* [StreamThread-4]: org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Successfully joined group new-part-advice with generation 99 4584 INFO 2017-03-01 18:27:43,048 [StreamThread-4]: org.apache.kafka.streams.processor.internals.StreamThread - stream-thread *[StreamThread-4] New partitions [[advice-stream-8, advice-stream**-4]]* assigned at the end of consumer rebalance. ************************************* Task Task 0_4 lock's as per line #56823 above still seems to be with Thread-1 !!!* 4625 WARN 2017-03-01 *18:27:44,092* *[StreamThread-4]*: org.apache.kafka.streams.processor.internals.StreamThread - *Could not create task 0_4.* Will retry. 4646 WARN 2017-03-01 18:27:45,145 *[StreamThread-4]*: org.apache.kafka.streams.processor.internals.StreamThread - Could not create task 0_4. Will retry. 4955 WARN 2017-03-01 18:29:37,508 *[StreamThread-4]*: org.apache.kafka.streams.processor.internals.StreamThread - Could not create task 0_4. Will retry. 4982 WARN 2017-03-01 18:29:49,125 *[StreamThread-3]*: org.apache.kafka.streams.processor.internals.StreamThread - Could not create task 0_4. Will retry. 5073 WARN 2017-03-01 18:31:20,911 *[StreamThread-4]*: org.apache.kafka.streams.processor.internals.StreamThread - Could not create task 0_4. Will retry. 5173 WARN 2017-03-01 18:34:46,714 *[StreamThread-4]*: org.apache.kafka.streams.processor.internals.StreamThread - Could not create task 0_4. Will retry. 5199 WARN 2017-03-01 18:36:39,729 *[StreamThread-3]*: org.apache.kafka.streams.processor.internals.StreamThread - Could not create task 0_4. Will retry. 5248 WARN 2017-03-01 18:41:37,329 *[StreamThread-4]*: org.apache.kafka.streams.processor.internals.StreamThread - Could not create task 0_4. Will retry. 5279 WARN 2017-03-01 18:50:19,977 *[StreamThread-3]*: org.apache.kafka.streams.processor.internals.StreamThread - Could not create task 0_4. Will retry. 5397 WARN 2017-03-01 *19:19:13,900* *[StreamThread-1]*: org.apache.kafka.streams.processor.internals.StreamThread - *Could not create task 0_8*. Will retry. 5721 WARN 2017-03-01 *22:03:07,629 [StreamThread-1]*: org.apache.kafka.streams.processor.internals.StreamThread - *Could not create task 0_8*. Will retry. 5698 WARN 2017-03-01 *22:01:32,017* *[StreamThread-3]*: org.apache.kafka.streams.processor.internals.StreamThread - Could not create task 0_4. Will retry. 5746 WARN 2017-03-01 *22:06:29,340* *[StreamThread-4]*: org.apache.kafka.streams.processor.internals.StreamThread - *Could not create task 0_4.* Will retry. *======== TestKafkaAdvice.log ============* 148 WARN 2017-03-02 *01:41:35,831* *[StreamThread-1]*: org.apache.kafka.streams.processor.internals.StreamThread - *Could not create task 0_8.* Will retry. 735 WARN 2017-03-02 *08:58:31,234* *[StreamThread-1]*: org.apache.kafka.streams.processor.internals.StreamThread - *Could not create task 0_8*. Will retry. 712 WARN 2017-03-02 *08:56:55,810 [StreamThread-3]*: org.apache.kafka.streams.processor.internals.StreamThread - Could not create task 0_4. Will retry. 760 WARN 2017-03-02 *09:01:52,945* *[StreamThread-4]*: org.apache.kafka.streams.processor.internals.StreamThread - *Could not create task 0_4*. Will retry. *********************************** Finally all threads reach the deadlock state where they cannot create task* Please let us know, why we get such a deadlock state. Thanks Sachin