There is a bug in the 0.9.0.1 client which causes consumers to get stuck waiting for a connection to be ready to complete.

The root cause is in the connect(...) method of

clients/src/main/java/org/apache/kafka/common/network/Selector.java

Here's the trouble item:

        try {
            socketChannel.connect(address);
        } catch (UnresolvedAddressException e) {

The assumption is that socketChannel.connect(address) always returns false when in non-blocking mode. A good assumption... but, sadly, wrong.

When spinning up several dozen consumers at the same time we see a small number (one or two) where socketChannel.connect(...) returns true. When that happens the connection is valid and SelectionKey.OP_CONNECT will never be triggered. The poll(long timeout) method in the same class will wait for the channel to become ready with key.isConnectable() but that will never happen since the channel is already fully connected before the select is called.

I implemented a sloppy fix which was able to demonstrate that addressing this case solves my stuck consumer problem.

How do I submit a bug report for this issue, or does this email constitute a bug report?

--Larkin

Reply via email to