Just for posterity: what happened here was an issue with the hostname ERROR 2016-01-08 22:02:09,675 [main] [none] c.k.messaging.kafka.ConsumerGroup: ip-10-100-102-52: ip-10-100-102-52: unknown error ! java.net.UnknownHostException: ip-10-100-102-52: unknown error ! at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_65] ! at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[na:1.8.0_65] ! at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[na:1.8.0_65] ! at java.net.InetAddress.getLocalHost(InetAddress.java:1500) ~[na:1.8.0_65] ! ... 63 common frames omitted ! Causing: java.net.UnknownHostException: ip-10-100-102-52: ip-10-100-102-52: unknown error ! at java.net.InetAddress.getLocalHost(InetAddress.java:1505) ~[na:1.8.0_65] ! at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:119) ~[k2-app-1.0-RC52.jar:na] ! at kafka.javaapi.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:66) ~[k2-app-1.0-RC52.jar:na] ! at kafka.javaapi.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:69) ~[k2-app-1.0-RC52.jar:na] ! at kafka.consumer.Consumer$.createJavaConsumerConnector(ConsumerConnector.scala:105) ~[k2-app-1.0-RC52.jar:na] ! at kafka.consumer.Consumer.createJavaConsumerConnector(ConsumerConnector.scala) ~[k2-app-1.0-RC52.jar:na]
This is the error on the client. In the stacktrace above, ip-10-100-102-52 is the hostname of the client connecting to Zookeeper. Setting the hostname correctly fixes this. Still not sure why the client hostname would be a problem here though, but at least it’s lesson learnt. (Probably a combination of factors caused this exception to be completely swallowed, but I think that’s a different topic) Cos On Friday, 8 January 2016 at 11:16, Cosmin Marginean wrote: > Hi Marko, this seems to have solved this. Dealing with another issue now, > which I’ll report separately. > Thank you for your help! > > Cheers > Cos > > > On Friday, 8 January 2016 at 09:27, Cosmin Marginean wrote: > > > Hi Marko, I will migrate the code and also change the timeout. thanks for > > your suggestions. Will post a status once I’ve tested. > > > > Cheers > > Cos > > > > > > On Thursday, 7 January 2016 at 22:59, Marko Bonaći wrote: > > > > > Actually, why don't you use the same code as outlined here (that includes > > > timeout in props): > > > http://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html > > > > > > Marko Bonaći > > > Monitoring | Alerting | Anomaly Detection | Centralized Log Management > > > Solr & Elasticsearch Support > > > Sematext <http://sematext.com/> | Contact > > > <http://sematext.com/about/contact.html> > > > > > > On Thu, Jan 7, 2016 at 11:55 PM, Marko Bonaći <marko.bon...@sematext.com > > > (mailto:marko.bon...@sematext.com)> > > > wrote: > > > > > > > Hi Cosmin, > > > > do you have default server configuration on these new nodes you're > > > > setting > > > > up? > > > > I'd check consumer's socket.timeout.ms (http://socket.timeout.ms), > > > > maybe someone set it to 30 > > > > instead of 30 000 :) > > > > Speaking from my own experience (I had the same symptom and this turned > > > > out to be the cause). > > > > > > > > Marko Bonaći > > > > Monitoring | Alerting | Anomaly Detection | Centralized Log Management > > > > Solr & Elasticsearch Support > > > > Sematext <http://sematext.com/> | Contact > > > > <http://sematext.com/about/contact.html> > > > > > > > > On Thu, Jan 7, 2016 at 11:23 PM, Cosmin Marginean > > > > <cosmargin...@gmail.com (mailto:cosmargin...@gmail.com)> > > > > wrote: > > > > > > > > > Hi > > > > > > > > > > I have a straightforward piece of code that creates a consumer (Kafka > > > > > 0.9.0.0). > > > > > > > > > > Properties props = new Properties(); > > > > > props.put("zookeeper.connect", zookeeperServers); > > > > > props.put(org.apache.kafka.clients.consumer.ConsumerConfig.GROUP_ID_CONFIG, > > > > > groupId); > > > > > log.info (http://log.info)("Starting consumer group for topic {} and > > > > > group ID {}. Zookeeper servers: {}", topic, groupId, > > > > > zookeeperServers); > > > > > consumer = kafka.consumer.Consumer.createJavaConsumerConnector(new > > > > > ConsumerConfig(props)); > > > > > log.info (http://log.info)("Consumer group started for topic {} and > > > > > group ID {}", topic, groupId); > > > > > > > > > > We’ve run this countless times without any issues, but now we’re > > > > > deploying a new environment (AWS, just like the ones before) and it > > > > > appears that the client Java process dies entirely (without any > > > > > logs/crash report/etc). This happens right after logging the > > > > > “Starting consumer group..”, so presumably when it tries to > > > > > createJavaConsumerConnector > > > > > > > > > > Agreeably, this might be “environmental”, but even though we triple > > > > > checked everything (network setup, kafka logs, zookeeper logs, etc), > > > > > we couldn’t identify anything suspicious yet. So what I'd like to > > > > > know is if there’s a way to add further Kafka diagnosis/logging. > > > > > Attached (trace-logging.txt) is further logging after turning > > > > > everything to TRACE, and at the top you can see the message “Starting > > > > > consumer…”, but with nothing really suspicious as far as I can tell. > > > > > > > > > > > > > > > As an additional piece of information, Zookeeper does report the > > > > > following when this happens > > > > > > > > > > 2016-01-07 21:58:44,763 [myid:1] - WARN > > > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - > > > > > caught end of stream exception > > > > > EndOfStreamException: Unable to read additional data from client > > > > > sessionid 0x1521e14797c0001, likely client has closed socket > > > > > at > > > > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) > > > > > at > > > > > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) > > > > > at java.lang.Thread.run(Thread.java:745) > > > > > 2016-01-07 21:58:44,764 [myid:1] - INFO > > > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - > > > > > Closed socket connection for client /10.100.101.159:41613 which had > > > > > sessionid 0x1521e14797c0001 > > > > > > > > > > > > > > > Any suggestions would be appreciated. > > > > > > > > > > Thank you > > > > > > > > > > Cosmin > > >