Thanks for reporting this issue. Agree that this is a problem for Kafka users using AWS. Please can you open a JIRA so we can keep track of this?
On Sun, Aug 17, 2014 at 11:41 AM, Bae, Jae Hyeon <metac...@gmail.com> wrote: > Recently, we found the serious ZkClient bug, actual Apache Zookeeper client > bug, which can bring down broker/consumer on zookeeper push. > > We're running kafka and zookeeeper in AWS EC2 environment. Zookeeper > instances are bound with EIP to give the static hostname for each instance, > which means even if the EC2 instance is terminated and replaced with the > new one, it will have the same hostname but its private IP bound to the > hostname can be changed. > > The scenario is, if we do rolling push all zookeeper server instances by > terminating and waiting until the new instance joins to the quorum one by > one, finally, ZkClient will try to connect to the old IP addresses which do > not exist any more due to DNS caching on Apache Zookeeper client side, > please refer to https://issues.apache.org/jira/browse/ZOOKEEPER-338 > > So, we need to restart kafka brokers and consumers to refresh DNS cache. To > solve this problem, I sent the following pull request to ZkClient, > https://github.com/sgroschupf/zkclient/pull/26 > > Please review the above PR. If new version of ZkClient with the following > fix is not released on the schedule of kafka 0.8.2 release, I'd like kafka > to ship the internally built ZkClient with the fix. I will really > appreciate. > > Thank you > Best, Jae >