Re: [IronPython] Performance of IronPython 2 Beta 4 and IronPython 1

Ronnie Maor Tue, 19 Aug 2008 11:40:59 -0700

nice!
thanks for **kw fix. that was really bugging us.

looks like 2.0b5 is where we'll start seriously looking at switching to 2.0.
do you know when it's expected?


On Tue, Aug 19, 2008 at 7:09 PM, Dino Viehland <[EMAIL PROTECTED]> wrote:

> The fixes for these (specifically the ones I said I had fixes for :) ) are
> now checked in and the next source code push (as well as Beta 5) will have
> the fixes.
>
> In addition to these issues I've fixed the **kw perf bug recently reported
> (http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=17679).
>  I've also attempted to remove some of the overhead of using sets - we had
> an extra level of indirection that wasn't necessary and we could also reduce
> some hashing (where we'd hash twice) we had to do now that we don't use .NET
> dictionaries directly.  It'd be interesting to see if that helps on your
> machine which shows sets running slowly on or not.  Finally creation of
> instances w/ __slots__ was horribly slow and that's fixed as well.  Here's
> the code review mail if anyone's interested in the details:
>
> DLR: OptimizedScriptCode shouldn't be re-writing the AST every time
> through.  I've added a cache so we'll only re-write it once.
>
> Python:
>               Speed up dictionary creation by avoding repeated locking.
>  Now we new up an object array w/ the values and pass it off to create in
> one action.
>
>               List addition should leave a little bit of buffer room for
> further operations, also adding fast paths for extend with well known types.
>  Finally switching OrderedLocker to being a struct and ensuring it handles
> the case where the objects are the same.
>
>               Set: Removing a level of indirection - we now hold directly
> onto a CommonDictionaryStorage instead of going through PythonDictionary for
> the same effect.  I've also removed several __contains__ checks which aren't
> necessary because CommonDictionaryStorage guarantees the correct semantics
> (replacing an existing value only updates the keys)
>
>               PythonFunction - speed up function creation by looking
> directly in the global scope dict and creating the FunctinoCode.  Lazily
> creating this does no good as it's always created and the
> Interlocked.CompareExchange call isn't cheap.
>
> Calling new-style classes no longer looks up __init__, it now just does a
> TryGetBoundValue on it.
>
>               Creating instances of types w/ __slots__ is very slow.
>  Switched to using an object array and we no longer require code gen for
> generating the accesors.  This all around simplifies the NewTypeMaker code
> as well as the ReflectedSlotProperty code.
>
> 17679 - Performance issue when calling a function with **kw
>               We now go through the DictionaryStorage Clone API when we
> have a Python dictionary.  Also added "HasNonStringAttributes" method onto
> DictionaryStorage which SymbolId dicts can optimize away.  But
> CommonDictionaryStorage can also optimize it by avoiding lots of locking.
>  Finally we now throw a reasonable exception when we don't get a dictionary.
>
> Also making EnvironmentDictionaryStorage not inherit from
> CommonDictionaryStorage.  AddNoLock was certainly not doing the right thing
> and this way it's clearer what's going on there.
>
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:
> [EMAIL PROTECTED] On Behalf Of Michael Foord
> Sent: Friday, August 15, 2008 11:41 AM
> To: Discussion of IronPython
> Subject: Re: [IronPython] Performance of IronPython 2 Beta 4 and IronPython
> 1
>
> Dino Viehland wrote:
> > Ok, I looked into a bunch of these and here's what I've discovered so far
> and other random comments...
> >
> > Exceptions (100000): 40% slower
> >     IP1: 4703
> >     IP2: 6125
> >     Py:   266
> >
> > I haven't looked at this one yet.  I do know that we have a number of bug
> fixes for our exception handling which will slow it down though.  I don't
> consider this to be a high priority though.  If we wanted to focus on
> exception perf I think we'd want to do something radical rather than small
> tweaks to the existing code.  If there's certain scenarios where exception
> perf is critical though it'd be interesting to hear about those and if we
> can do anything to improve them.
> >
> >
> I can look at this.
>
> > Engine execution: 8000% slower!!
> >     IP1: 1600
> >     IP2: 115002
> >
> > This is just a silly bug.  We're doing a tree re-write of the AST and we
> do that every time through.  Caching that re-write gets us back to 1.x
> performance.  I have a fix for this.
> >
>
> Great! (1.x performance was very impressive.)
>
> > Create function: 25% slower
> >     IP1: 2828
> >     IP2: 3640
> >     Py:  2766
> >
> > Part of this is from a bug fix but the fix could be more efficient.  In
> 1.x we don't look up __module__ from the global scope.  In 2.x we do this
> lookup but it searches all scopes - which isn't even correct.  But we can do
> a direct lookup which is a little faster - so I have a partial fix for this.
>  This will still be a little slower than 1.x though.
> >
> >
> Ok.
>
> > Define oldstyle (1 000 000): 33% slower
> >     IP1: 1781
> >     IP2: 2671
> >     Py:  2108
> >
> > Is this critical?  I'd rather just live w/ the slowness rather than
> fixing something that will be gone in 3.x :)
> >
> >
>
> Not a problem for us - I merely noted it. In 1.x we needed to switch a
> few classes to old style for performance reasons (but we don't
> repeatedly redefine them - it was instantiation time). In 2.x we will
> need to switch back (which is great).
>
> > Lists (10 000): 50% slower
> >     IP1: 10422
> >     IP2: 16109
> >     Py:   6094
> >
> > The primary issue here is that adding 2 lists ends up creating a new list
> whose storage is the exact size needed for storing the two lists.  When you
> append to it after adding it we need to allocate a brand new array - and
> you're not dealing with small arrays here.  We can add a little extra space
> depending on the size of the array to minimize the chance of needing a
> re-size.  That gets us to about 10% slower than CPython.  I'm also going to
> add a strongly typed extend overload which should make those calls a little
> faster.
> >
> >
>
> Python lists will typically grow to always have a lot of space. Creating
> a list with no extra space seems like a problem. My benchmark for this
> was unrealistic though (we add lists and extend them a lot - but
> typically they're nothing like that size).
>
>
> > Sets2 (100 000): 500% slower
> >     IP1:  4984
> >     IP2: 30547
> >     Py:   1203
> >
> > This one I actually cannot repro yet (I've tried it on 3 machines but
> they've all been Vista).  I'm going to try next on a Srv 2k3 machine and see
> if I can track it down.  But more information would be useful.
> >
>
> Hmmm... I wonder if it is an oddity with my machine. Unfortunately I am
> not at work today and can't repeat it. I've just run it on Vista (.NET
> 2.0.50727.3053) running under VMWare Fusion (but on a kick-arse machine).
>
> IP1.1.2:  3515
> IP2.0B4: 2516
>
> I need to rerun the whole Resolver port on someone else's machine.
>
> > Comparing (== and !=):
> >     IP1: 278597
> >     IP2: 117662
> >
> > This one is actually pretty interesting (even though we're faster in 2.x)
> - there's an issue with the test here.  You've defined "__neq__" instead of
> "__ne__".
>
> Ha! Oops. :-)
> >  That causes the != comparison to ultimately compare based upon object
> identity - which is extremely slow.  There might be some things we can do to
> make the object identity comparison faster (For example recognizing that
> we're doing equality and just need a eq or ne answer rather than a 1, -1, 0
> comparison value).  But I'm going to assume comparing on object identity
> isn't very important right now - let me know if I'm wrong.
> We do use identity comparison a lot - but I'm not sure if it is in
> performance critical parts of our code.  I can review this.
>
> > But switching this to __ne__ causes us to be a little faster than
> CPython.  They have a great advantage on object identity comparisons - they
> can just use the objects address.
> >
> >
> Sure.
>
> > I was also curious what happens to this case if we use __slots__.  That
> identified yet another massive performance regression which I have a fix for
> - creating instances that have __slots__ defined is horribly slow.  With
> that bug fixed and using slots and __ne__ instead of __neq__ we can actually
> run this over 2x faster than CPython (on Vista x86 .NET 3.5SP1  on a 2.4ghz
> Core 2 w/ 4gb of RAM).
> >
> >
> Cool.
>
> Michael
>
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:
> [EMAIL PROTECTED] On Behalf Of Michael Foord
> > Sent: Thursday, August 14, 2008 9:42 AM
> > To: Discussion of IronPython
> > Subject: Re: [IronPython] Performance of IronPython 2 Beta 4 and
> IronPython 1
> >
> > Just for fun I also compared with CPython. The results are interesting,
> I'll turn it into a blog post of course...
> >
> > Results in milliseconds with a granularity of about 15ms and so an
> accuracy of +/- ~60ms.
> > All testing with 10 000 000 operations unless otherwise stated.
> >
> > The version of Python I compared against was Python 2.4.
> >
> > Empty loop (overhead):
> >     IP1: 422
> >     IP2: 438
> >     Py: 3578
> >
> > Create instance newstyle:
> >     IP1: 20360
> >     IP2:  1109
> >     Py:   4063
> >
> > Create instance oldstyle:
> >     IP1: 3766
> >     IP2: 3359
> >     Py:  4797
> >
> > Function call:
> >     IP1: 937
> >     IP2: 906
> >     Py: 3313
> >
> > Create function: 25% slower
> >     IP1: 2828
> >     IP2: 3640
> >     Py:  2766
> >
> > Define newstyle (1 000 000):
> >     IP1: 42047
> >     IP2: 20484
> >     Py:  23921
> >
> > Define oldstyle (1 000 000): 33% slower
> >     IP1: 1781
> >     IP2: 2671
> >     Py:  2108
> >
> > Comparing (== and !=):
> >     IP1: 278597
> >     IP2: 117662
> >     Py:   62423
> >
> > Sets:
> >     IP1: 37095
> >     IP2: 30860
> >     Py:   8047
> >
> > Lists (10 000): 50% slower
> >     IP1: 10422
> >     IP2: 16109
> >     Py:   6094
> >
> > Recursion (10 000):
> >     IP1: 1125
> >     IP2: 1000
> >     Py:  3609
> >
> > Sets2 (100 000): 500% slower
> >     IP1:  4984
> >     IP2: 30547
> >     Py:   1203
> >
> > func_with_args:
> >     IP1: 6312
> >     IP2: 5906
> >     Py: 11250
> >
> > method_with_args:
> >     IP1: 20594
> >     IP2: 11813
> >     Py:  14875
> >
> > method_with_kwargs:
> >     IP1: 27953
> >     IP2: 11187
> >     Py:  20032
> >
> > import: 15% slower
> >     IP1: 28469
> >     IP2: 32000
> >     Py:  25782
> >
> > global: 20% slower
> >     IP1: 1047
> >     IP2: 1203
> >     Py:  4141
> >
> > Exceptions (100000): 40% slower
> >     IP1: 4703
> >     IP2: 6125
> >     Py:   266
> >
> > Engine execution: 8000% slower!!
> >     IP1: 1600
> >     IP2: 115002
> >
> >
> > Michael Foord wrote:
> >
> >> Hello all,
> >>
> >> I've ported Resolver One to run on IronPython 2 Beta 4 to check for
> >> any potential problems (we will only do a *proper* port once IP 2 is
> >> out of beta).
> >>
> >> The basic porting was straightforward and several bugs have been fixed
> >> since IP 2 B3 - many thanks to the IronPython team.
> >>
> >> The good news is that Resolver One is only 30-50% slower than Resolver
> >> One on IronPython 1! (It was 300 - 400% slower on top of IP 2 B3.)
> >> Resolver One is fairly heavily optimised around the performance
> >> hotspots of IronPython 1, so we expect to have to do a fair bit of
> >> profiling and refactoring to readjust to the performance profile of IP
> 2.
> >>
> >> Having said that, there are a few oddities (and the areas that slow
> >> down vary tremendously depending on which spreadsheet we use to
> >> benchmark it - making it fairly difficult to track down the hotspots).
> >>
> >> We have one particular phase of spreadsheet calculation that takes
> >> 0.4seconds on IP1 and around 6 seconds on IP2, so I have been doing
> >> some micro-benchmarking to try and identify the hotspot. I've
> >> certainly found part of the problem.
> >>
> >> For those that are interested I've attached the very basic
> >> microbenchmarks I've been using. The nice thing is that in *general*
> >> IP2 does outperform IP1.
> >>
> >> The results that stand out in the other direction are:
> >>
> >> Using sets with custom classes (that define '__eq__', '__ne__' and
> >> '__hash__') seems to be 6 times slower in IronPython 2.
> >>
> >> Adding lists together is about 50% slower.
> >>
> >> Defining functions seems to be 25% slower and defining old style
> >> classes about 33% slower. (Creating instances of new style classes is
> >> massively faster though - thanks!)
> >>
> >> The code I used to test sets (sets2.py) is as follows:
> >>
> >> from System import DateTime
> >>
> >> class Thing(object):
> >>    def __init__(self, val):
> >>        self.val = val
> >>      def __eq__(self, other):
> >>        return self.val == other.val
> >>
> >>    def __neq__(self):
> >>        return not self.__eq__(other)
> >>          def __hash__(self):
> >>        return hash(self.val)
> >>             def test(s):
> >>    a = set()
> >>    for i in xrange(100000):
> >>        a.add(Thing(i))
> >>        a.add(Thing(i+1))
> >>        Thing(i) in a
> >>        Thing(i+2) in a
> >>    return (DateTime.Now -s).TotalMilliseconds
> >>   s = DateTime.Now
> >> print test(s)
> >>
> >>
> >> Interestingly the time taken is exactly the same if I remove the
> >> definition of '__hash__'.
> >>
> >> The full set of results below:
> >>
> >> Results in milliseconds with a granularity of about 15ms and so an
> >> accuracy of +/- ~60ms.
> >> All testing with 10 000 000 operations unless otherwise stated.
> >>
> >> Empty loop (overhead):
> >>    IP1: 421.9
> >>    IP2: 438
> >>   Create instance newstyle:
> >>    IP1: 20360
> >>    IP2: 1109
> >>   Create instance oldstyle:
> >>    IP1: 3766
> >>    IP2: 3359
> >>   Function call:
> >>    IP1: 937
> >>    IP2: 906
> >>   Create function: 25% slower
> >>    IP1: 2828
> >>    IP2: 3640
> >>   Define newstyle (1 000 000):
> >>    IP1: 42047
> >>    IP2: 20484
> >>   Define oldstyle (1 000 000): 33% slower
> >>    IP1: 1781
> >>    IP2: 2671
> >>
> >> Comparing (== and !=):
> >>    IP1: 278597
> >>    IP2: 117662
> >>   Sets (with numbers):
> >>    IP1: 37095
> >>    IP2: 30860
> >>
> >> Lists (10 000): 50% slower
> >>    IP1: 10422
> >>    IP2: 16109
> >>
> >> Recursion (10 000):
> >>    IP1: 1125
> >>    IP2: 1000
> >>
> >> Sets2 (100 000): 600% slower
> >>    IP1: 4984
> >>    IP2: 30547
> >>
> >>
> >> I'll be doing more as the 600% slow down for sets and the 50% slow
> >> down for lists accounts for some of the dependency analysis problem
> >> but not all of it.
> >>
> >> Many Thanks
> >>
> >> Michael Foord
> >> --
> >> http://www.resolversystems.com
> >> http://www.ironpythoninaction.com
> >>
> >>
> >>
> >> ----------------------------------------------------------------------
> >> --
> >>
> >> _______________________________________________
> >> Users mailing list
> >> [email protected]
> >> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> >>
> >
> >
> > --
> > http://www.ironpythoninaction.com/
> > http://www.voidspace.org.uk/
> > http://www.trypython.org/
> > http://www.ironpython.info/
> > http://www.resolverhacks.net/
> > http://www.theotherdelia.co.uk/
> >
> > _______________________________________________
> > Users mailing list
> > [email protected]
> > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> >
>
>
> --
> http://www.ironpythoninaction.com/
> http://www.voidspace.org.uk/
> http://www.trypython.org/
> http://www.ironpython.info/
> http://www.theotherdelia.co.uk/
> http://www.resolverhacks.net/
>
> _______________________________________________
> Users mailing list
> [email protected]
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> _______________________________________________
> Users mailing list
> [email protected]
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>

_______________________________________________
Users mailing list
[email protected]
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Re: [IronPython] Performance of IronPython 2 Beta 4 and IronPython 1

Reply via email to