,
Julien
-Message d'origine-
De : Karl Wright
Envoyé : mercredi 12 janvier 2022 02:35
À : dev
Objet : Re: Zookeeper locks issue
If you are using existing connectors we are shipping, then all you need to do
is tell us which ones are involved in your entire job pipeline.
If you have any
ect() method ?
>
>
> Cedric
>
>
>
> De : Cedric Ulmer
> Envoyé : jeudi 6 janvier 2022 16:35:55
> À : dev
> Objet : RE: Zookeeper locks issue
>
> Hi Karl,
>
>
> to bounce back on this issue, since you asked about how we initiat
Hi,
would you also want to have a look at the disconnect() method ?
Cedric
De : Cedric Ulmer
Envoyé : jeudi 6 janvier 2022 16:35:55
À : dev
Objet : RE: Zookeeper locks issue
Hi Karl,
to bounce back on this issue, since you asked about how we initiate
_
De : Karl Wright
Envoyé : vendredi 10 décembre 2021 15:32
À : dev
Objet : Re: Zookeeper locks issue
You haven't told me anything about the connectors involved in this job.
For every connector, there are connection pools of limited sizes,
partitioned by configu
Envoyé : jeudi 9 décembre 2021 15:37
> À : dev
> Objet : Re: Zookeeper locks issue
>
> The fact that you only see this on one job is pretty clearly evidence that
> we are seeing a hang of some kind due something a specific connector or
> connection is doing.
>
> I'm going
15:37
À : dev
Objet : Re: Zookeeper locks issue
The fact that you only see this on one job is pretty clearly evidence that we
are seeing a hang of some kind due something a specific connector or connection
is doing.
I'm going to have to guess wildly here to focus us on a productive path.
What I
The fact that you only see this on one job is pretty clearly evidence that
we are seeing a hang of some kind due something a specific connector or
connection is doing.
I'm going to have to guess wildly here to focus us on a productive path.
What I want to rule out is a case where the connector
Actually, I have several jobs, but only one job is running at a time,
and currently the error always happens on the same one. The problem is
that I can't access the environment in debug mode, I also can't activate
debug log because I am limited in log size, so the only thing I can do,
is to
The large number of connections can happen but usually that means something
is stuck somewhere and there is a "train wreck" of other locks getting
backed up.
If this is completely repeatable then I think we have an opportunity to
figure out why this is happening. One thing that is clear is that
Ok that makes sense. But still, I don't understand how the "Can't
release lock we don't hold" exception can happen, knowing for sure that
neither the Zookeeper process or the MCF agent process have been down
and/or restarted. Not sure that increasing the session lifetime would
solve that
What this code is doing is interpreting exceptions back from Zookeeper.
There are some kinds of exceptions it interprets as "session has expired",
so it rebuilds the session.
The code is written in such a way that the locks are presumed to persist
beyond the session. In fact, if they do not
Karl,
I tried to understand the Zookeeper lock logic in the code, and the only
thing I don't understand is the 'handleEphemeralNodeKeeperException'
method that is called in the catch(KeeperException e) of every
obtain/release lock method of the ZookeeperConnection class.
This method sets
Yes, I will then try the patch and see if it is working
Regards,
Julien
Le 07/12/2021 à 14:28, Karl Wright a écrit :
Yes, this is plausible. But I'm not sure what the solution is. If a
zookeeper session disappears, according to the documentation everything
associated with that session
Yes, this is plausible. But I'm not sure what the solution is. If a
zookeeper session disappears, according to the documentation everything
associated with that session should also disappear.
So I guess we could catch this error and just ignore it, assuming that the
session must be gone anyway?
Hi,
the Zookeeper lock error mentioned in the before last comment of this
issue https://issues.apache.org/jira/browse/CONNECTORS-1447:
FATAL 2017-08-04 09:28:25,855 (Agents idle cleanup thread) - Error tossed:
Can't release lock we don't hold
java.lang.IllegalStateException: Can't release lock
15 matches
Mail list logo