Hi Marc,

On Jul 26, 2010, at 8:31 AM, Marc Guenther wrote:
> On 23.07.2010, at 16:55, Chuck Hill wrote:
> On Jul 23, 2010, at 5:08 AM, Marc Guenther wrote:
>> 
>>> Hi,
>>> 
>>> We recently had some deadlock problems involving EC and OSC locking, and I 
>>> would like to validate my thoughts with the list.
>>> 
>>> As a very basic rule with everything multithreaded, whenever two threads 
>>> want access to the same resources, they have to lock them in a predefined 
>>> identical order. Otherwise you have that classic deadlock possibility, 
>>> where T1 has A and wants B, and T2 has B and wants A.
>>> 
>>> This, for EOF, means the following:
>>> 
>>> Whenever you hold an OSC lock, you are NOT allowed to lock any EC anymore!
>> 
>> I think of it as taking escalating locks: EC -> DB Context -> OSC
> 
> To ensure this (at least the EC -> OSC stuff), I added this assertion to our 
> EC subclass:
> 
>    @Override
>    public void lock() {
>        // make sure the OSC is not locked, or we ourselves are already locked
>        assert !isOSCLocked() || this._lock.isHeldByCurrentThread();
>        super.lock();
>    }
> 
>    /** if the OSC is locked and we are not */
>    private boolean isOSCLocked() {
>        EOObjectStore rootOS = rootObjectStore();
>        if (rootOS instanceof EOObjectStoreCoordinator) {
>            ReentrantLock lck = ((EOObjectStoreCoordinator) rootOS)._lock();
>            return lck != null && lck.isHeldByCurrentThread();
>        }
>        return false;
>    }
> 
> This immediately catched our problem, and so far neither found nor caused any 
> others. Don't know yet if it will cause any trouble.
> 
>>> Why not? Cause the normal order is the other way round. First you lock the 
>>> EC, then, when doing something with it, you lock the OSC.
>>> 
>>> For example, our specific deadlock occurred, cause we had a Thread1:
>>> - lock the OSC
>>> - in a loop, do some operations involving locking of lots of ECs
>>> (we have some stuff in there, which posts notifications to all ECs)
>>> - Unlock OSC
>> 
>> That seems like a long time to have the OSC locked.  Why are you locking the 
>> OSC?
> 
> Actually, we were locking the DBCntxt. We are reloading some EOModels, and it 
> seemed to be necessary to invalidate objects:
> 
>  dbc.invalidateObjectsWithGlobalIDs(dbc.database().snapshots().allKeys());
> 
> Unfortunately, the actual removeModel() and addModel() calls were also in 
> that try block. We moved them further down, and now the deadlock seems to be 
> gone.

I'd want to hold the OSC lock while modifying a model group (reloading models).


>>> Now, while T1 was busy in its loop, some T2 was fetching a single EO:
>>> - lock the EC
>>> - fetch EO causing OSC to lock
>>> -> which waits for T2 to finish
>>> 
>>> And, as soon as the T1 loop reaches the locked EC of T2
>>> 
>>> -> Deadlock
>>> 
>>> 
>>> Now, the questions I have:
>>> - Is this analysis correct?
>> 
>> Yes.
>> 
>> 
>>> - What do we do about it?
>> 
>> Lock in the correct order.  Don't lock and hold locked the OSC.  Use 
>> different EOF stacks (hence different OSC).  Don't touch the same EC in more 
>> than one thread concurrently.
> 
> At least the last one isn't really in our control, is it? All ECs are 
> automatically touched outside of their threads whenever this ObjectsChanged 
> notification is fired around.

Only if they are not locked.  It is a conditional operation: lock if possible, 
or queue if locked.  That is probably the only safe operation across threads.


> I remember a very hard to track down deadlock several years ago (WO 5.0), 
> that was caused by this notification.
> 
>>> If I understand all this correctly, my above rule means:
>>> 
>>> _Whenever you have the need to explicitely lock an OSC, you can't do any 
>>> operation, which would cause an formerly unlocked (by you) EC to be locked._
>>> 
>>> Which might be more than you think. IIRC, you are not allowed to make any 
>>> changes to any EO. Because that causes EOObjectsChangedInStoreNotifications 
>>> to be fired around to all ECs.
>> 
>> That happens when saveChanges() is called.  saveChanges() locks the OSC.
> 
> Ah. And while it has the OSC lock, it locks all ECs in turn? Isn't that 
> exactly the case I described above? Why does this ever work? Or does it lock 
> the ECs after it released the OSC again?
> 
> I guess it uses some kind of magic. It definitely didn't trigger my assertion 
> above.
> 
>>> Which in turn, causes all these ECs to be locked. Which violates the rule.
>> 
>> If they are already locked, it will only queue the notification, not wait 
>> for a lock.
> 
> Which explains the magic :) This stuff is definitely not for the faint 
> hearted.


No.  :-)  This part of EOF is very complex.  It is like a finely engineered 
machine.


>>> Now, if you really need to do this (you NEED to lock an OSC, and you NEED 
>>> to use ECs while having it locked), you could think, Oh it's OK, I just 
>>> lock all of them first. But then again, in what order do you lock the ECs? 
>>> Because if you do it in random order, you again have the possibility of 
>>> deadlock.
>> 
>> That sounds like you may be sharing ECs between threads in a dangerous 
>> pattern.
> 
> Yes. I guess it all boils down to: just don't do that :)

Yes, you are very likely to cause problems doing that.  Avoiding the pattern is 
the best choice.


>>> So, what is everybody doing?
>>> - I don't lock the OSC myself. Why would I want to?
>> 
>>> - I do, but I am VERY careful about what I do with it being locked.
>>> - I do, and so far I have been lucky.
>> 
>> Never any problems, provided you lock it at the appropriate time.
> 
> OK. Thanks for the help. It seems to be working fine now.
> 
> Now I'm back to investigate this missing snapshot issue that's still haunting 
> us.

That is a fun one.  invalidateObjectsWithGlobalIDs can cause that, especially 
if the OSC is not locked.


Chuck


-- 
Chuck Hill             Senior Consultant / VP Development

Practical WebObjects - for developers who want to increase their overall 
knowledge of WebObjects or who are trying to solve specific problems.    
http://www.global-village.net/products/practical_webobjects







Attachment: smime.p7s
Description: S/MIME cryptographic signature

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to