There is a bug in the 0.9.0.1 client which causes consumers to get stuck
waiting for a connection to be ready to complete.
The root cause is in the connect(...) method of
clients/src/main/java/org/apache/kafka/common/network/Selector.java
Here's the trouble item:
try {
socketChannel.connect(address);
} catch (UnresolvedAddressException e) {
The assumption is that socketChannel.connect(address) always returns
false when in non-blocking mode. A good assumption... but, sadly, wrong.
When spinning up several dozen consumers at the same time we see a small
number (one or two) where socketChannel.connect(...) returns true. When
that happens the connection is valid and SelectionKey.OP_CONNECT will
never be triggered. The poll(long timeout) method in the same class will
wait for the channel to become ready with key.isConnectable() but that
will never happen since the channel is already fully connected before
the select is called.
I implemented a sloppy fix which was able to demonstrate that addressing
this case solves my stuck consumer problem.
How do I submit a bug report for this issue, or does this email
constitute a bug report?
--Larkin