[ https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang updated KAFKA-2168: --------------------------------- Labels: new-consumer-threading-should-fix (was: ) > New consumer poll() can block other calls like position(), commit(), and > close() indefinitely > --------------------------------------------------------------------------------------------- > > Key: KAFKA-2168 > URL: https://issues.apache.org/jira/browse/KAFKA-2168 > Project: Kafka > Issue Type: Sub-task > Components: clients, consumer > Reporter: Ewen Cheslack-Postava > Assignee: Jason Gustafson > Priority: Critical > Labels: new-consumer-threading-should-fix > Fix For: 0.9.0.0 > > Attachments: KAFKA-2168.patch, KAFKA-2168_2015-06-01_16:03:38.patch, > KAFKA-2168_2015-06-02_17:09:37.patch, KAFKA-2168_2015-06-03_18:20:23.patch, > KAFKA-2168_2015-06-03_21:06:42.patch, KAFKA-2168_2015-06-04_14:36:04.patch, > KAFKA-2168_2015-06-05_12:01:28.patch, KAFKA-2168_2015-06-05_12:44:48.patch, > KAFKA-2168_2015-06-11_14:09:59.patch, KAFKA-2168_2015-06-18_14:39:36.patch, > KAFKA-2168_2015-06-19_09:19:02.patch, KAFKA-2168_2015-06-22_16:34:37.patch, > KAFKA-2168_2015-06-23_09:39:07.patch, KAFKA-2168_2015-06-30_10:54:22.patch > > > The new consumer is currently using very coarse-grained synchronization. For > most methods this isn't a problem since they finish quickly once the lock is > acquired, but poll() might run for a long time (and commonly will since > polling with long timeouts is a normal use case). This means any operations > invoked from another thread may block until the poll() call completes. > Some example use cases where this can be a problem: > * A shutdown hook is registered to trigger shutdown and invokes close(). It > gets invoked from another thread and blocks indefinitely. > * User wants to manage offset commit themselves in a background thread. If > the commit policy is not purely time based, it's not currently possibly to > make sure the call to commit() will be processed promptly. > Two possible solutions to this: > 1. Make sure a lock is not held during the actual select call. Since we have > multiple layers (KafkaConsumer -> NetworkClient -> Selector -> nio Selector) > this is probably hard to make work cleanly since locking is currently only > performed at the KafkaConsumer level and we'd want it unlocked around a > single line of code in Selector. > 2. Wake up the selector before synchronizing for certain operations. This > would require some additional coordination to make sure the caller of > wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() > thread being woken up and then promptly reacquiring the lock with a > subsequent long poll() call). -- This message was sent by Atlassian Jira (v8.20.1#820001)