We saw the error again in our cluster. Anyone has the same issue before? On Fri, 10 Jul 2015 at 13:26 tao xiao <xiaotao...@gmail.com> wrote:
> Bump the thread. Any help would be appreciated. > > On Wed, 8 Jul 2015 at 20:09 tao xiao <xiaotao...@gmail.com> wrote: > >> Additional info >> Kafka version: 0.8.2.1 >> zookeeper: 3.4.6 >> >> On Wed, 8 Jul 2015 at 20:07 tao xiao <xiaotao...@gmail.com> wrote: >> >>> Hi team, >>> >>> I have 10 high level consumers connecting to Kafka and one of them kept >>> complaining "conflicted ephemeral node" for about 8 hours. The log was >>> filled with below exception >>> >>> [2015-07-07 14:03:51,615] INFO conflict in >>> /consumers/group/ids/test-1435856975563-9a9fdc6c data: >>> {"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275631510"} >>> stored data: >>> {"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275558570"} >>> (kafka.utils.ZkUtils$) >>> [2015-07-07 14:03:51,616] INFO I wrote this conflicted ephemeral node >>> [{"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275631510"}] >>> at /consumers/group/ids/test-1435856975563-9a9fdc6c a while back in a >>> different session, hence I will backoff for this node to be deleted by >>> Zookeeper and retry (kafka.utils.ZkUtils$) >>> >>> In the meantime zookeeper reported below exception for the same time span >>> >>> 2015-07-07 22:45:09,687 [myid:3] - INFO [ProcessThread(sid:3 >>> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException >>> when processing sessionid:0x44e657ff19c0019 type:create cxid:0x7a26 >>> zxid:0x3015f6e77 txntype:-1 reqpath:n/a Error >>> Path:/consumers/group/ids/test-1435856975563-9a9fdc6c Error:KeeperErrorCode >>> = NodeExists for /consumers/group/ids/test-1435856975563-9a9fdc6c >>> >>> At the end zookeeper timed out the session and consumers triggered >>> rebalance. >>> >>> I know that conflicted ephemeral node warning is to handle a zookeeper >>> bug that session expiration and ephemeral node deletion are not done >>> atomically but as indicated from zookeeper log the zookeeper never got a >>> chance to delete the ephemeral node which made me think that the session >>> was not expired at that time. And for some reason zookeeper fired session >>> expire event which subsequently invoked ZKSessionExpireListener. I was >>> just wondering if anyone have ever encountered similar issue before and >>> what I can do at zookeeper side to prevent this? >>> >>> Another problem is that createEphemeralPathExpectConflictHandleZKBug >>> call is wrapped in a while(true) loop which runs forever until the >>> ephemeral node is created. Would it be better that we can employ an >>> exponential retry policy with a max number of retries so that it has a >>> chance to re-throw the exception back to caller and let caller handle it in >>> situation like above? >>> >>>