[Fwd: my ldap locking problem]

Dan Mosedale Wed, 21 Feb 2001 20:02:36 -0800

So we're not really concentrating on the problem that this message is 
written about.  But it _is_ some of the only documentation about the 
XPCOM wrapper code other than the IDL files and the LDAP C SDK doc set.

Dan

OK, so I've written up a bunch of text about how the LDAP code works and 
the particular problem I'm having related to calling out of module with 
a lock held.  I'd love to hear any suggestions you have related to this....

Thanks,.
Dan

A few caveats about the code:
-----------------------------
http://lxr.mozilla.org/seamonkey/source/directory/xpcom/TODO.txt is a
list of the known issues about the code. The issues most relevant to
this particular discussion are probably:

My current plan is to get rid of all the code in nsLDAPChannel.cpp for
INVOKE_LDAP_CALLBACKS_ON_MAIN_THREAD code (it defaults to 0 == off
already).

A lot of the code in xpcom/base/src is actually part of the XPCOM
wrapper and belongs there. But not all of it is:
nsLDAPProtocolHandler* and nsLDAPChannel* are clients of the XPCOM
wrapper and should be in their own directory/module. nsLDAPService is
part of the XPCOM wrapper, but it currently contains dead code; in the
not-too-distant future it will be resurrected to manage
sharing/caching of LDAP connections between multiple callers.

The code that's currently in nsLDAPChannel::Cancel has thread safety
problems, and is what I'm trying to fix.

How the LDAP code works
-----------------------
There is a connection object is used to keep track of each LDAP server
connection, and it is accessed through the nsILDAPConnection.idl interface:
<http://lxr.mozilla.org/seamonkey/source/directory/xpcom/base/public/nsILDAPConnection.idl>.

To use it, you create one and call the init() method on it. This will
cause the connection object to spin up a thread:

<http://lxr.mozilla.org/seamonkey/source/directory/xpcom/base/src/nsLDAPConnection.cpp#339>

which it uses to listen for operation results by looping around
ldap_result() (a call into the (threadsafe) LDAP C SDK).

Then you create an operation object, used through the
nsILDAPOperation.idl interface:

<http://lxr.mozilla.org/seamonkey/source/directory/xpcom/base/public/nsILDAPOperation.idl>.

The operation object is initialized by calling its init() method,
which tells it the connection it will execute on, and passes in an
nsILDAPMessageListener interface,

<http://lxr.mozilla.org/seamonkey/source/directory/xpcom/base/public/nsILDAPMessageListener.idl>,

which is where the results get called back to. After this, the actual
operation method to be executed (eg simpleBind, searchExt) is called,
which starts the operation. This actually adds the operation in
question to the connection's list of pending operations, and then
calls into the (threadsafe) LDAP C SDK to start the operation.

Whenever a message from the server arrives, the connection thread
figures out which operation it belongs to, and then invokes the
onLDAPMessage method of the associated nsILDAPMessageListener
callback. The callback is invoked on the connection thread, so that
it stays out of the way of UI stuff.

Worth mentioning is that unlike the other interfaces that have been
described so far, objects that implement the nsILDAPMessageListener
will generally be clients of the XPCOM wrapper for the LDAP C SDK, not
part of the wrapper.

nsLDAPChannel is an example such client, which is the protocol handler
for ldap: URLs. Like many nsIChannels
<http://lxr.mozilla.org/seamonkey/source/netwerk/base/public/nsIChannel.idl>
the interesting stuff starts to happen when its asyncRead method
<http://lxr.mozilla.org/seamonkey/source/directory/xpcom/base/src/nsLDAPChannel.cpp>
is called.

It creates an operation to simpleBind to the LDAP server, sets
mCurrentOperation to point to it, and then kicks it off. Then, when
the bind has completed, the nsLDAPChannel object gets a callback, in
which it creates, points mCurrentOperation to, and kicks of the
searchExt() operation itself.

When the channel is done, it needs to clean up the resources in use.
After the simpleBind is kicked off, almost all the remaining channel
code runs on the connection thread, meaning that almost all resources
are being used only from the connection thread. The only code that
might still be executed on the main thread is the Cancel() function,
called to abort the channel, or the destructor (currently empty).

Rather than locking every resource that might be in use, it seems to
make more sense to simply cause the cleanup to happen on the
connection thread. The problem I'm having involves how to do that.
In the near future, nsLDAPChannel won't be guaranteed to be the sole
user of the connection, so it only gets to abort its operation(s) off
the connection, not shutdown the connection itself (there will be an
nsLDAPService for worrying about that).

This is one of the main uses of the mCurrentOperation member variable
in nsLDAPChannel: it is the link to the currently running operation on
the nsLDAPConnection thread, which is what needs to be aborted. The
connection thread spends most of its time sitting in the while loop
http://lxr.mozilla.org/seamonkey/source/directory/xpcom/base/src/nsLDAPConnection.cpp#376
so any notification that there's work that needs to be done needs to
be noticed there, which is inside the XPCOM wrapper, presumably by
changing to a loop which sleeps in ldap_result() for a while, and then
tests a flag to see whether any operations need to be aborted.

The abort itself will be coming the LDAP channel code, which is a
client of the wrapper, either from the main thread (via
nsLDAPConnection::Cancel) or the connection thread (in one of the
callbacks).

Now the nsLDAPConnection code doesn't know anything about the callback
code; it only knows about itself (the connection object) and any
operation objects. So in order to notice any operation aborts in a
reasonable way, I think the channel code needs to set an
abortOperation attribute on the nsLDAPOperation object, which in turn
will set a processAbortedOperations attribute on the connection object,
telling the connection code that it needs to scan the list of
operations and process any with the abortOperation attribute set.
This probably involves adding an OnAbort() method to
nsILDAPMessageListener that gets called to do the actual cleanup.
In the case of the channel, most of the code that current lives in
nsLDAPChannel::Cancel would move here.

In order to set an attribute on the current operation, I need to call
something like (the currently non-existent)
mCurrentOperation->SetAbortFlag() from nsLDAPChannel::Cancel(). And
here's where I'm getting stuck: SetAbortFlag would belong to the XPCOM
wrapper, and nsLDAPChannel is a client of the wrapper, which is
(should be) a different module. And since mCurrentOperation gets
changed and (finally) cleared on the connection thread, I can't call
through it without locking it first.

And so I sit, needing to call across a module boundary with a lock held.

Now in the process of writing this up, one possible way around this
occurred to me: add tiny functions to nsLDAPChannel that change and
clear mCurrentOperation, and when I need to do that stuff on the
connection thread, do it through proxy calls to the main thread. This
would eliminate the need for locking mCurrentOperation, I think. It
seems awfully gross, though.

Anyway, I'd love to hear any thoughts or suggestions you have about
this, especially in case I'm missing something obvious.

[Fwd: my ldap locking problem]

Reply via email to