Hi Jeremy and Rabih, I think the problem is in the way you have constructed your Session class.
The fact that you have a proton::session object "m_session" embedded in your Session object, and the Session object's destructor is called in the non-callback thread, means that you are making an unsafe call to m_session's destructor from that thread. The counting problem that you see is not the sole effect that this destructor has, and making a few operations atomic can't fix the thread safety problem on its own. One solution would be to use a smart pointer to the Proton object, for example std::shared_ptr<proton::session> m_session; And have a call to m_session.reset(); in the Session::on_session_close() callback (or the lambda it calls, as your code is structured). As long as your sessionClosed promise isn't completed/fulfilled until after the pointer reset, your Session object can subsequently go out of scope safely. I'm not saying that this is the one true way to tackle the problem. But you do need to ensure that the Proton destructor is not called in the non-callback thread and that there is a thread safe way for the non-callback thread to "know" what it can and cannot do with any referenced Proton objects it deals with, or a way to defer an action until it is safe. To be boringly complete, I will admit that the "workaround" you used // !!!!!! without the following line we encounter random segfaults !!!!!! sessionHandler.m_session = proton::session(); has almost the exact same effect as the shared_ptr suggestion above. However, that's not guaranteed to never change (however unlikely) in Proton's internals. And standard smart pointer usage is easier to remember and read in code than remembering Proton's "rules". I will try to capture some of this in a JIRA for enhanced documentation. Cliff On Wed, Apr 24, 2019 at 5:59 AM Rabih M <rabih.prom...@gmail.com> wrote: > > Hello, > > We manage to reproduce the bug in a simplified use case. > Please find attached the "Broker.hpp" along with the test "Main.cpp". > The usage was inspired from the proton C++ multi-threaded examples and the > doc: > https://github.com/apache/qpid-proton/blob/master/cpp/docs/mt.md#thread-safety-rules. > Each of container, connection, and session have different handlers. > The connection and session handlers are destroyed when they go out of scope. > Handlers hold a copy on the proton objects as members. > > If we run the test without adding the line 142 (which releases the proton > session) we have race condition that leads to a segfault. > > From the cores, it is always segfaulting on the ref counting. > This is why we proposed the atomic types in C11. > > Best regards. > Jeremy and Rabih > > > On Tue, Apr 23, 2019 at 9:28 AM Cliff Jansen <cliffjan...@gmail.com> wrote: >> >> I am of the same opinion as Gordon that fixing the counting (to be atomic) >> will only address the current symptom without preventing other races that >> will present themselves later or on other hardware architectures. >> >> Having two threads participating in Proton object reference counting is >> problematic as you can't guarantee which ones will end up running the >> finalizers (or resurectors if the reference count increases during >> finalization). >> >> The simplest way to ensure thread safety is to abstain from all direct >> (except for the handful of documented thread-safe interfaces) and indirect >> calls into the Proton engine from any non-callback thread. In C++ the >> latter is more difficult, as there can be hidden reference counting if you >> have Proton objects in your application objects. There you have be certain >> that you never have such objects being copied or destroyed in a >> non-callback thread. Alternatively, you can use dumb pointers to the >> Proton objects and use some application mechanism (mutex, condition >> variable...) to allow the non-callback thread to know when the pointer to >> the Proton object is valid. >> >> Cliff >> >> On Mon, Apr 22, 2019 at 1:49 AM Gordon Sim <g...@redhat.com> wrote: >> >> > On 19/04/2019 11:02 am, Rabih M wrote: >> > > We are only using proton objects in the handler, however, at destruction >> > > time, there is a race condition on the reference counter between the >> > proton >> > > objects from the main thread that are being destroyed, and the proton >> > > objects held by the proton thread. >> > >> > Can you elaborate a little on this? Access to (or destruction of) >> > objects from distinct connection should not need to be coordinated, >> > though objects for the same connection clearly have to be. >> > >> > Is the issue you are seeing related to some global state (e.g. sasl >> > destruction)? >> > >> > What calls are being made that are in a race? Is this plain C or c++ >> > with implicit calls being made by destructors? >> > >> > > The reference counter not being atomic is very error prone for users. >> > > How about transforming the int to an atomic int (already provided in C11 >> > > standard: https://en.cppreference.com/w/c/language/atomic)? >> > >> > My initial thought is that just making the counter atomic would not be >> > sufficient for general races, so it would be important to understand >> > what specific uses cases this was designed to protect. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org >> > For additional commands, e-mail: users-h...@qpid.apache.org >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org > For additional commands, e-mail: users-h...@qpid.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org