Re: [IronPython] Ironclad problem, with which someone here may be able to help

William Reade Thu, 06 Nov 2008 06:01:27 -0800

Thanks Dino -- it seems that if I use ipy with -X:MTA, I can no longerreproduce the problem. However, this bothers me a bit: I'm not competentto follow all the consequences here, but this situation seems to implythat Ironclad won't be usable safely from any STA thread. Is this anintended restriction or a bug?

Incidentally, tracking down the call stacks proved to be hard work: thetiming changed enough that I only got a single failure over dozens ofruns, and it turned out I'd got the logging wrong, so I didn't find outanything useful :(.


Dino Viehland wrote:

I would suggest getting a snap shot of the call stacks when this is happening 
if that's possible.  I can't pin anything down but I wonder if you could have 
an STA object or something that otherwise requires message pumping.  That 
message pumping could happen while you're doing a Monitor.Enter call.  If that 
was happening maybe there is some subtle CLR bug or a surprise feature?  It is 
surprising that Monitor.Enter can go re-entrant but it can...

So it'd be interesting to get thread snapshots and see if

               EnsureGIL (443) 2
               EnsureGIL (443) 1      <- omg, wtf, bbq, etc.


Could be happening because Thread 1 experiences contention, blocks and pumps 
messages, causing the finalizer thread (2) to run, the lock is acquired and ... 
?

Only other thing I could think of is does it repro on other machines?  Maybe 
it's a hardware bug as unlikely as that seems?

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of William Reade
Sent: Wednesday, November 05, 2008 10:01 AM
To: Discussion of IronPython
Subject: Re: [IronPython] Ironclad problem, with which someone here may be able 
to help

The log starts in the middle (after many lock/unlocks, some from each
thread); I'm running on x86; and I have no additional AppDomains.

I don't think it would be safe for me to entirely avoid locking during
finalization, but I could probably cut it down to a quick lock, on a
separate object, to enqueue pointers for cleanup and deallocation on the
main thread. However, I'm reluctant to do that until I'm sure that the
problem is specifically related to GC, otherwise it'll just come back as
soon as anyone tries any serious multithreading :).

Curt Hagenlocher wrote:

Locking during finalization is often considered to be a bad idea.  In
particular, locking without a timeout introduces the possibility that
you will hang the finalization thread, preventing further objects from
being finalized.  But clearly, that's not what's happening here.

Other questions that probably don't matter but might be interesting to
know:

Can we assume that the finalization thread isn't the first place where
this lock is required?  That your log starts somewhere in the middle?

Is this under x86 or x64 or both?

Are you creating any additional AppDomains in the process?


On Wed, Nov 5, 2008 at 10:15 AM, William Reade
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    Hi Curt

    I am indeed; that's how I know thread 2 is the GC thread. Is
    locking during GC forbidden?

    William

    Curt Hagenlocher wrote:

        ...or, for that matter, any __del__ methods from within Python
        -- which ultimately are handled by finalization.

        On Wed, Nov 5, 2008 at 9:37 AM, Curt Hagenlocher
        <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
        <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>>
        wrote:

           So, the obvious question for me is whether or not you're
        using any
           finalizers.


           On Wed, Nov 5, 2008 at 5:57 AM, William Reade
           <[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>
        <mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>>>

           wrote:

               Hi all

               While running the numpy tests, I've come across a situation
               which, to the best of my knowledge, is simply
        impossible. I'm
               hoping that one of the local .NET gurus will be able to
        tell
               me what I'm missing, or point me somewhere I can get
        more insight.

               The 4 methods involved are as follows:
               -----------------------
                     public int GetThreadId()
                     {
                         return Thread.CurrentThread.ManagedThreadId;
                     }

                     public void WriteFlush(string info)
                     {
                         Console.WriteLine(info);
                         Console.Out.Flush();
                     }

                     public void EnsureGIL()
                     {
                         Monitor.Enter(this.dispatcherLock);
                         this.WriteFlush(String.Format(
                             "EnsureGIL ({1}) {0}", this.GetThreadId(),
               Builtin.id(this.dispatcherLock)));
                     }

                     public void ReleaseGIL()
                     {
                         this.WriteFlush(String.Format(
                             "ReleaseGIL ({1}) {0}\n", this.GetThreadId(),
               Builtin.id(this.dispatcherLock)));
                         Monitor.Exit(this.dispatcherLock);
                     }
               -----------------------
               ...and they can, and do, occasionally produce output as
        follows:
               -----------------------
               EnsureGIL (443) 2
               EnsureGIL (443) 1      <- omg, wtf, bbq, etc.
               ReleaseGIL (443) 2

               EnsureGIL (443) 2
               ReleaseGIL (443) 1

               ReleaseGIL (443) 2
               -----------------------
               When this happens, the process continues happily for a
        short
               time and then falls over in a later call to ReleaseGIL
        (after
               successfully calling it several times). The error is "
        Object
               synchronization method was called from an
        unsynchronized block
               of code", which I understand to mean "you can't release
        this
               lock because you don't hold it".

               It doesn't happen very often, but I can usually
        reproduce it
               by running
        test_multiarray.TestFromToFile.test_malformed a few
               hundred times. It may be relevant to note that thread 2
        is the
               GC thread, and thread 1 is the main thread. I have
        considered
               the following possibilities:

               (1) That I'm locking on the wrong object. I believe
        that isn't
               the case, because it's constructed only once, as a "new
               Object()" (ie, a reference type), and is only subsequently
               used for locking; and, because it keeps the same ipy id
               throughout.

               (2) That Monitor.Enter occasionally allows two different
               threads to acquire the same lock. I consider this extremely
               unlikely, because... well, how many multithreaded .NET apps
               already exist? If Monitor really were broken, I think we'd
               probably know about it by now.

               (3) That calling Flush() on a SyncTextWriter (the type of
               Console.Out) doesn't actually do anything, and the
        output is
               somehow wrongly ordered (although I can't imagine how this
               could actually be: if the locking is really working,
        then my
               console writes are strictly sequential). I don't have
        access
               to the code, so I have no idea how it's implemented,
        but even
               if this is the case it doesn't help much with the
        fundamental
               problem (the synchronisation error which follows).

               Apart from the above, I'm out of ideas. Can anyone suggest
               what I've missed?

               William
               _______________________________________________
               Users mailing list
               Users@lists.ironpython.com
        <mailto:Users@lists.ironpython.com>
        <mailto:Users@lists.ironpython.com
        <mailto:Users@lists.ironpython.com>>


         http://lists.ironpython.com/listinfo.cgi/users-ironpython.com



        ------------------------------------------------------------------------



        _______________________________________________
        Users mailing list
        Users@lists.ironpython.com <mailto:Users@lists.ironpython.com>
        http://lists.ironpython.com/listinfo.cgi/users-ironpython.com



    _______________________________________________
    Users mailing list
    Users@lists.ironpython.com <mailto:Users@lists.ironpython.com>
    http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


------------------------------------------------------------------------

_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com


_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Re: [IronPython] Ironclad problem, with which someone here may be able to help

Reply via email to