Additional info Kafka version: 0.8.2.1 zookeeper: 3.4.6 On Wed, 8 Jul 2015 at 20:07 tao xiao <xiaotao...@gmail.com> wrote:
> Hi team, > > I have 10 high level consumers connecting to Kafka and one of them kept > complaining "conflicted ephemeral node" for about 8 hours. The log was > filled with below exception > > [2015-07-07 14:03:51,615] INFO conflict in > /consumers/group/ids/test-1435856975563-9a9fdc6c data: > {"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275631510"} > stored data: > {"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275558570"} > (kafka.utils.ZkUtils$) > [2015-07-07 14:03:51,616] INFO I wrote this conflicted ephemeral node > [{"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275631510"}] > at /consumers/group/ids/test-1435856975563-9a9fdc6c a while back in a > different session, hence I will backoff for this node to be deleted by > Zookeeper and retry (kafka.utils.ZkUtils$) > > In the meantime zookeeper reported below exception for the same time span > > 2015-07-07 22:45:09,687 [myid:3] - INFO [ProcessThread(sid:3 > cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException > when processing sessionid:0x44e657ff19c0019 type:create cxid:0x7a26 > zxid:0x3015f6e77 txntype:-1 reqpath:n/a Error > Path:/consumers/group/ids/test-1435856975563-9a9fdc6c Error:KeeperErrorCode > = NodeExists for /consumers/group/ids/test-1435856975563-9a9fdc6c > > At the end zookeeper timed out the session and consumers triggered > rebalance. > > I know that conflicted ephemeral node warning is to handle a zookeeper bug > that session expiration and ephemeral node deletion are not done atomically > but as indicated from zookeeper log the zookeeper never got a chance to > delete the ephemeral node which made me think that the session was not > expired at that time. And for some reason zookeeper fired session expire > event which subsequently invoked ZKSessionExpireListener. I was just > wondering if anyone have ever encountered similar issue before and what I > can do at zookeeper side to prevent this? > > Another problem is that createEphemeralPathExpectConflictHandleZKBug call > is wrapped in a while(true) loop which runs forever until the ephemeral > node is created. Would it be better that we can employ an exponential retry > policy with a max number of retries so that it has a chance to re-throw the > exception back to caller and let caller handle it in situation like above? > >