Just for posterity: what happened here was an issue with the hostname

ERROR 2016-01-08 22:02:09,675 [main] [none] c.k.messaging.kafka.ConsumerGroup: 
ip-10-100-102-52: ip-10-100-102-52: unknown error
! java.net.UnknownHostException: ip-10-100-102-52: unknown error
! at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.8.0_65]
! at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) 
~[na:1.8.0_65]
! at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) 
~[na:1.8.0_65]
! at java.net.InetAddress.getLocalHost(InetAddress.java:1500) ~[na:1.8.0_65]
! ... 63 common frames omitted
! Causing: java.net.UnknownHostException: ip-10-100-102-52: ip-10-100-102-52: 
unknown error
! at java.net.InetAddress.getLocalHost(InetAddress.java:1505) ~[na:1.8.0_65]
! at 
kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:119)
 ~[k2-app-1.0-RC52.jar:na]
! at 
kafka.javaapi.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:66)
 ~[k2-app-1.0-RC52.jar:na]
! at 
kafka.javaapi.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:69)
 ~[k2-app-1.0-RC52.jar:na]
! at 
kafka.consumer.Consumer$.createJavaConsumerConnector(ConsumerConnector.scala:105)
 ~[k2-app-1.0-RC52.jar:na]
! at 
kafka.consumer.Consumer.createJavaConsumerConnector(ConsumerConnector.scala) 
~[k2-app-1.0-RC52.jar:na]



This is the error on the client. In the stacktrace above, ip-10-100-102-52 is 
the hostname of the client connecting to Zookeeper. Setting the hostname 
correctly fixes this. Still not sure why the client hostname would be a problem 
here though, but at least it’s lesson learnt.

(Probably a combination of factors caused this exception to be completely 
swallowed, but I think that’s a different topic)

Cos  


On Friday, 8 January 2016 at 11:16, Cosmin Marginean wrote:

> Hi Marko, this seems to have solved this. Dealing with another issue now, 
> which I’ll report separately.
> Thank you for your help!
>  
> Cheers
> Cos
>  
>  
> On Friday, 8 January 2016 at 09:27, Cosmin Marginean wrote:
>  
> > Hi Marko, I will migrate the code and also change the timeout. thanks for 
> > your suggestions. Will post a status once I’ve tested.
> >  
> > Cheers
> > Cos
> >  
> >  
> > On Thursday, 7 January 2016 at 22:59, Marko Bonaći wrote:
> >  
> > > Actually, why don't you use the same code as outlined here (that includes
> > > timeout in props):
> > > http://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
> > >  
> > > Marko Bonaći
> > > Monitoring | Alerting | Anomaly Detection | Centralized Log Management
> > > Solr & Elasticsearch Support
> > > Sematext <http://sematext.com/> | Contact
> > > <http://sematext.com/about/contact.html>
> > >  
> > > On Thu, Jan 7, 2016 at 11:55 PM, Marko Bonaći <marko.bon...@sematext.com 
> > > (mailto:marko.bon...@sematext.com)>
> > > wrote:
> > >  
> > > > Hi Cosmin,
> > > > do you have default server configuration on these new nodes you're 
> > > > setting
> > > > up?
> > > > I'd check consumer's socket.timeout.ms (http://socket.timeout.ms), 
> > > > maybe someone set it to 30
> > > > instead of 30 000 :)
> > > > Speaking from my own experience (I had the same symptom and this turned
> > > > out to be the cause).
> > > >  
> > > > Marko Bonaći
> > > > Monitoring | Alerting | Anomaly Detection | Centralized Log Management
> > > > Solr & Elasticsearch Support
> > > > Sematext <http://sematext.com/> | Contact
> > > > <http://sematext.com/about/contact.html>
> > > >  
> > > > On Thu, Jan 7, 2016 at 11:23 PM, Cosmin Marginean 
> > > > <cosmargin...@gmail.com (mailto:cosmargin...@gmail.com)>
> > > > wrote:
> > > >  
> > > > > Hi
> > > > >  
> > > > > I have a straightforward piece of code that creates a consumer (Kafka
> > > > > 0.9.0.0).
> > > > >  
> > > > > Properties props = new Properties();
> > > > > props.put("zookeeper.connect", zookeeperServers);
> > > > > props.put(org.apache.kafka.clients.consumer.ConsumerConfig.GROUP_ID_CONFIG,
> > > > >  groupId);
> > > > > log.info (http://log.info)("Starting consumer group for topic {} and 
> > > > > group ID {}. Zookeeper servers: {}", topic, groupId, 
> > > > > zookeeperServers);
> > > > > consumer = kafka.consumer.Consumer.createJavaConsumerConnector(new 
> > > > > ConsumerConfig(props));
> > > > > log.info (http://log.info)("Consumer group started for topic {} and 
> > > > > group ID {}", topic, groupId);
> > > > >  
> > > > > We’ve run this countless times without any issues, but now we’re 
> > > > > deploying a new environment (AWS, just like the ones before) and it 
> > > > > appears that the client Java process dies entirely (without any 
> > > > > logs/crash report/etc). This happens right after logging the 
> > > > > “Starting consumer group..”, so presumably when it tries to 
> > > > > createJavaConsumerConnector
> > > > >  
> > > > > Agreeably, this might be “environmental”, but even though we triple 
> > > > > checked everything (network setup, kafka logs, zookeeper logs, etc), 
> > > > > we couldn’t identify anything suspicious yet. So what I'd like to 
> > > > > know is if there’s a way to add further Kafka diagnosis/logging. 
> > > > > Attached (trace-logging.txt) is further logging after turning 
> > > > > everything to TRACE, and at the top you can see the message “Starting 
> > > > > consumer…”, but with nothing really suspicious as far as I can tell.
> > > > >  
> > > > >  
> > > > > As an additional piece of information, Zookeeper does report the 
> > > > > following when this happens
> > > > >  
> > > > > 2016-01-07 21:58:44,763 [myid:1] - WARN 
> > > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - 
> > > > > caught end of stream exception
> > > > > EndOfStreamException: Unable to read additional data from client 
> > > > > sessionid 0x1521e14797c0001, likely client has closed socket
> > > > > at 
> > > > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> > > > > at 
> > > > > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > 2016-01-07 21:58:44,764 [myid:1] - INFO 
> > > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - 
> > > > > Closed socket connection for client /10.100.101.159:41613 which had 
> > > > > sessionid 0x1521e14797c0001
> > > > >  
> > > > >  
> > > > > Any suggestions would be appreciated.
> > > > >  
> > > > > Thank you
> > > > >  
> > > > > Cosmin  
> >  
>  

Reply via email to