RE: Zookeeper locks issue

2022-01-12 Thread Julien Massiera
, Julien -Message d'origine- De : Karl Wright Envoyé : mercredi 12 janvier 2022 02:35 À : dev Objet : Re: Zookeeper locks issue If you are using existing connectors we are shipping, then all you need to do is tell us which ones are involved in your entire job pipeline. If you have any

Re: Zookeeper locks issue

2022-01-11 Thread Karl Wright
ect() method ? > > > Cedric > > > > De : Cedric Ulmer > Envoyé : jeudi 6 janvier 2022 16:35:55 > À : dev > Objet : RE: Zookeeper locks issue > > Hi Karl, > > > to bounce back on this issue, since you asked about how we initiat

RE: Zookeeper locks issue

2022-01-11 Thread Cedric Ulmer
Hi, would you also want to have a look at the disconnect() method ? Cedric De : Cedric Ulmer Envoyé : jeudi 6 janvier 2022 16:35:55 À : dev Objet : RE: Zookeeper locks issue Hi Karl, to bounce back on this issue, since you asked about how we initiate

RE: Zookeeper locks issue

2022-01-06 Thread Cedric Ulmer
_ De : Karl Wright Envoyé : vendredi 10 décembre 2021 15:32 À : dev Objet : Re: Zookeeper locks issue You haven't told me anything about the connectors involved in this job. For every connector, there are connection pools of limited sizes, partitioned by configu

Re: Zookeeper locks issue

2021-12-10 Thread Karl Wright
Envoyé : jeudi 9 décembre 2021 15:37 > À : dev > Objet : Re: Zookeeper locks issue > > The fact that you only see this on one job is pretty clearly evidence that > we are seeing a hang of some kind due something a specific connector or > connection is doing. > > I'm going

RE: Zookeeper locks issue

2021-12-10 Thread julien.massiera
15:37 À : dev Objet : Re: Zookeeper locks issue The fact that you only see this on one job is pretty clearly evidence that we are seeing a hang of some kind due something a specific connector or connection is doing. I'm going to have to guess wildly here to focus us on a productive path. What I

Re: Zookeeper locks issue

2021-12-09 Thread Karl Wright
The fact that you only see this on one job is pretty clearly evidence that we are seeing a hang of some kind due something a specific connector or connection is doing. I'm going to have to guess wildly here to focus us on a productive path. What I want to rule out is a case where the connector

Re: Zookeeper locks issue

2021-12-09 Thread Julien Massiera
Actually, I have several jobs, but only one job is running at a time, and currently the error always happens on the same one. The problem is that I can't access the environment in debug mode, I also can't activate debug log because I am limited in log size, so the only thing I can do, is to

Re: Zookeeper locks issue

2021-12-09 Thread Karl Wright
The large number of connections can happen but usually that means something is stuck somewhere and there is a "train wreck" of other locks getting backed up. If this is completely repeatable then I think we have an opportunity to figure out why this is happening. One thing that is clear is that

Re: Zookeeper locks issue

2021-12-07 Thread Julien Massiera
Ok that makes sense. But still, I don't understand how the "Can't release lock we don't hold" exception can happen, knowing for sure that neither the Zookeeper process or the MCF agent process have been down and/or restarted. Not sure that increasing the session lifetime would solve that

Re: Zookeeper locks issue

2021-12-07 Thread Karl Wright
What this code is doing is interpreting exceptions back from Zookeeper. There are some kinds of exceptions it interprets as "session has expired", so it rebuilds the session. The code is written in such a way that the locks are presumed to persist beyond the session. In fact, if they do not

Re: Zookeeper locks issue

2021-12-07 Thread Julien Massiera
Karl, I tried to understand the Zookeeper lock logic in the code, and the only thing I don't understand is the 'handleEphemeralNodeKeeperException' method that is called in the catch(KeeperException e) of every obtain/release lock method of the ZookeeperConnection class. This method sets

Re: Zookeeper locks issue

2021-12-07 Thread Julien Massiera
Yes, I will then try the patch and see if it is working Regards, Julien Le 07/12/2021 à 14:28, Karl Wright a écrit : Yes, this is plausible. But I'm not sure what the solution is. If a zookeeper session disappears, according to the documentation everything associated with that session

Re: Zookeeper locks issue

2021-12-07 Thread Karl Wright
Yes, this is plausible. But I'm not sure what the solution is. If a zookeeper session disappears, according to the documentation everything associated with that session should also disappear. So I guess we could catch this error and just ignore it, assuming that the session must be gone anyway?

Zookeeper locks issue

2021-12-07 Thread Julien Massiera
Hi, the Zookeeper lock error mentioned in the before last comment of this issue https://issues.apache.org/jira/browse/CONNECTORS-1447: FATAL 2017-08-04 09:28:25,855 (Agents idle cleanup thread) - Error tossed: Can't release lock we don't hold java.lang.IllegalStateException: Can't release lock