We saw the error again in our cluster.  Anyone has the same issue before?

On Fri, 10 Jul 2015 at 13:26 tao xiao <xiaotao...@gmail.com> wrote:

> Bump the thread. Any help would be appreciated.
>
> On Wed, 8 Jul 2015 at 20:09 tao xiao <xiaotao...@gmail.com> wrote:
>
>> Additional info
>> Kafka version: 0.8.2.1
>> zookeeper: 3.4.6
>>
>> On Wed, 8 Jul 2015 at 20:07 tao xiao <xiaotao...@gmail.com> wrote:
>>
>>> Hi team,
>>>
>>> I have 10 high level consumers connecting to Kafka and one of them kept
>>> complaining "conflicted ephemeral node" for about 8 hours. The log was
>>> filled with below exception
>>>
>>> [2015-07-07 14:03:51,615] INFO conflict in
>>> /consumers/group/ids/test-1435856975563-9a9fdc6c data:
>>> {"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275631510"}
>>> stored data:
>>> {"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275558570"}
>>> (kafka.utils.ZkUtils$)
>>> [2015-07-07 14:03:51,616] INFO I wrote this conflicted ephemeral node
>>> [{"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275631510"}]
>>> at /consumers/group/ids/test-1435856975563-9a9fdc6c a while back in a
>>> different session, hence I will backoff for this node to be deleted by
>>> Zookeeper and retry (kafka.utils.ZkUtils$)
>>>
>>> In the meantime zookeeper reported below exception for the same time span
>>>
>>> 2015-07-07 22:45:09,687 [myid:3] - INFO  [ProcessThread(sid:3
>>> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
>>> when processing sessionid:0x44e657ff19c0019 type:create cxid:0x7a26
>>> zxid:0x3015f6e77 txntype:-1 reqpath:n/a Error
>>> Path:/consumers/group/ids/test-1435856975563-9a9fdc6c Error:KeeperErrorCode
>>> = NodeExists for /consumers/group/ids/test-1435856975563-9a9fdc6c
>>>
>>> At the end zookeeper timed out the session and consumers triggered
>>> rebalance.
>>>
>>> I know that conflicted ephemeral node warning is to handle a zookeeper
>>> bug that session expiration and ephemeral node deletion are not done
>>> atomically but as indicated from zookeeper log the zookeeper never got a
>>> chance to delete the ephemeral node which made me think that the session
>>> was not expired at that time. And for some reason zookeeper fired session
>>> expire event which subsequently invoked ZKSessionExpireListener.  I was
>>> just wondering if anyone have ever encountered similar issue before and
>>> what I can do at zookeeper side to prevent this?
>>>
>>> Another problem is that createEphemeralPathExpectConflictHandleZKBug
>>> call is wrapped in a while(true) loop which runs forever until the
>>> ephemeral node is created. Would it be better that we can employ an
>>> exponential retry policy with a max number of retries so that it has a
>>> chance to re-throw the exception back to caller and let caller handle it in
>>> situation like above?
>>>
>>>

Reply via email to