Re: [IronPython] Performance of IronPython 2 Beta 4 and IronPython 1

Dino Viehland Tue, 19 Aug 2008 11:50:43 -0700

Somewhere around 4 weeks from now.

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ronnie Maor
Sent: Tuesday, August 19, 2008 11:41 AM
To: Discussion of IronPython
Subject: Re: [IronPython] Performance of IronPython 2 Beta 4 and IronPython 1


nice!
thanks for **kw fix. that was really bugging us.

looks like 2.0b5 is where we'll start seriously looking at switching to 2.0.
do you know when it's expected?
On Tue, Aug 19, 2008 at 7:09 PM, Dino Viehland <[EMAIL PROTECTED]<mailto:[EMAIL 
PROTECTED]>> wrote:
The fixes for these (specifically the ones I said I had fixes for :) ) are now 
checked in and the next source code push (as well as Beta 5) will have the 
fixes.

In addition to these issues I've fixed the **kw perf bug recently reported 
(http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=17679).  I've 
also attempted to remove some of the overhead of using sets - we had an extra 
level of indirection that wasn't necessary and we could also reduce some 
hashing (where we'd hash twice) we had to do now that we don't use .NET 
dictionaries directly.  It'd be interesting to see if that helps on your 
machine which shows sets running slowly on or not.  Finally creation of 
instances w/ __slots__ was horribly slow and that's fixed as well.  Here's the 
code review mail if anyone's interested in the details:

DLR: OptimizedScriptCode shouldn't be re-writing the AST every time through.  
I've added a cache so we'll only re-write it once.

Python:
              Speed up dictionary creation by avoding repeated locking.  Now we 
new up an object array w/ the values and pass it off to create in one action.

              List addition should leave a little bit of buffer room for 
further operations, also adding fast paths for extend with well known types.  
Finally switching OrderedLocker to being a struct and ensuring it handles the 
case where the objects are the same.

              Set: Removing a level of indirection - we now hold directly onto 
a CommonDictionaryStorage instead of going through PythonDictionary for the 
same effect.  I've also removed several __contains__ checks which aren't 
necessary because CommonDictionaryStorage guarantees the correct semantics 
(replacing an existing value only updates the keys)

              PythonFunction - speed up function creation by looking directly 
in the global scope dict and creating the FunctinoCode.  Lazily creating this 
does no good as it's always created and the Interlocked.CompareExchange call 
isn't cheap.

Calling new-style classes no longer looks up __init__, it now just does a 
TryGetBoundValue on it.

              Creating instances of types w/ __slots__ is very slow.  Switched 
to using an object array and we no longer require code gen for generating the 
accesors.  This all around simplifies the NewTypeMaker code as well as the 
ReflectedSlotProperty code.

17679 - Performance issue when calling a function with **kw
              We now go through the DictionaryStorage Clone API when we have a 
Python dictionary.  Also added "HasNonStringAttributes" method onto 
DictionaryStorage which SymbolId dicts can optimize away.  But 
CommonDictionaryStorage can also optimize it by avoiding lots of locking.  
Finally we now throw a reasonable exception when we don't get a dictionary.

Also making EnvironmentDictionaryStorage not inherit from 
CommonDictionaryStorage.  AddNoLock was certainly not doing the right thing and 
this way it's clearer what's going on there.



-----Original Message-----
From: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> [mailto:[EMAIL 
PROTECTED]<mailto:[EMAIL PROTECTED]>] On Behalf Of Michael Foord
Sent: Friday, August 15, 2008 11:41 AM
To: Discussion of IronPython
Subject: Re: [IronPython] Performance of IronPython 2 Beta 4 and IronPython 1

Dino Viehland wrote:
> Ok, I looked into a bunch of these and here's what I've discovered so far and 
> other random comments...
>
> Exceptions (100000): 40% slower
>     IP1: 4703
>     IP2: 6125
>     Py:   266
>
> I haven't looked at this one yet.  I do know that we have a number of bug 
> fixes for our exception handling which will slow it down though.  I don't 
> consider this to be a high priority though.  If we wanted to focus on 
> exception perf I think we'd want to do something radical rather than small 
> tweaks to the existing code.  If there's certain scenarios where exception 
> perf is critical though it'd be interesting to hear about those and if we can 
> do anything to improve them.
>
>
I can look at this.

> Engine execution: 8000% slower!!
>     IP1: 1600
>     IP2: 115002
>
> This is just a silly bug.  We're doing a tree re-write of the AST and we do 
> that every time through.  Caching that re-write gets us back to 1.x 
> performance.  I have a fix for this.
>

Great! (1.x performance was very impressive.)

> Create function: 25% slower
>     IP1: 2828
>     IP2: 3640
>     Py:  2766
>
> Part of this is from a bug fix but the fix could be more efficient.  In 1.x 
> we don't look up __module__ from the global scope.  In 2.x we do this lookup 
> but it searches all scopes - which isn't even correct.  But we can do a 
> direct lookup which is a little faster - so I have a partial fix for this.  
> This will still be a little slower than 1.x though.
>
>
Ok.

> Define oldstyle (1 000 000): 33% slower
>     IP1: 1781
>     IP2: 2671
>     Py:  2108
>
> Is this critical?  I'd rather just live w/ the slowness rather than fixing 
> something that will be gone in 3.x :)
>
>

Not a problem for us - I merely noted it. In 1.x we needed to switch a
few classes to old style for performance reasons (but we don't
repeatedly redefine them - it was instantiation time). In 2.x we will
need to switch back (which is great).

> Lists (10 000): 50% slower
>     IP1: 10422
>     IP2: 16109
>     Py:   6094
>
> The primary issue here is that adding 2 lists ends up creating a new list 
> whose storage is the exact size needed for storing the two lists.  When you 
> append to it after adding it we need to allocate a brand new array - and 
> you're not dealing with small arrays here.  We can add a little extra space 
> depending on the size of the array to minimize the chance of needing a 
> re-size.  That gets us to about 10% slower than CPython.  I'm also going to 
> add a strongly typed extend overload which should make those calls a little 
> faster.
>
>

Python lists will typically grow to always have a lot of space. Creating
a list with no extra space seems like a problem. My benchmark for this
was unrealistic though (we add lists and extend them a lot - but
typically they're nothing like that size).


> Sets2 (100 000): 500% slower
>     IP1:  4984
>     IP2: 30547
>     Py:   1203
>
> This one I actually cannot repro yet (I've tried it on 3 machines but they've 
> all been Vista).  I'm going to try next on a Srv 2k3 machine and see if I can 
> track it down.  But more information would be useful.
>

Hmmm... I wonder if it is an oddity with my machine. Unfortunately I am
not at work today and can't repeat it. I've just run it on Vista (.NET
2.0.50727.3053) running under VMWare Fusion (but on a kick-arse machine).

IP1.1.2:  3515
IP2.0B4: 2516

I need to rerun the whole Resolver port on someone else's machine.

> Comparing (== and !=):
>     IP1: 278597
>     IP2: 117662
>
> This one is actually pretty interesting (even though we're faster in 2.x) - 
> there's an issue with the test here.  You've defined "__neq__" instead of 
> "__ne__".

Ha! Oops. :-)
>  That causes the != comparison to ultimately compare based upon object 
> identity - which is extremely slow.  There might be some things we can do to 
> make the object identity comparison faster (For example recognizing that 
> we're doing equality and just need a eq or ne answer rather than a 1, -1, 0 
> comparison value).  But I'm going to assume comparing on object identity 
> isn't very important right now - let me know if I'm wrong.
We do use identity comparison a lot - but I'm not sure if it is in
performance critical parts of our code.  I can review this.

> But switching this to __ne__ causes us to be a little faster than CPython.  
> They have a great advantage on object identity comparisons - they can just 
> use the objects address.
>
>
Sure.

> I was also curious what happens to this case if we use __slots__.  That 
> identified yet another massive performance regression which I have a fix for 
> - creating instances that have __slots__ defined is horribly slow.  With that 
> bug fixed and using slots and __ne__ instead of __neq__ we can actually run 
> this over 2x faster than CPython (on Vista x86 .NET 3.5SP1  on a 2.4ghz Core 
> 2 w/ 4gb of RAM).
>
>
Cool.

Michael

> -----Original Message-----
> From: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> [mailto:[EMAIL 
> PROTECTED]<mailto:[EMAIL PROTECTED]>] On Behalf Of Michael Foord
> Sent: Thursday, August 14, 2008 9:42 AM
> To: Discussion of IronPython
> Subject: Re: [IronPython] Performance of IronPython 2 Beta 4 and IronPython 1
>
> Just for fun I also compared with CPython. The results are interesting, I'll 
> turn it into a blog post of course...
>
> Results in milliseconds with a granularity of about 15ms and so an accuracy 
> of +/- ~60ms.
> All testing with 10 000 000 operations unless otherwise stated.
>
> The version of Python I compared against was Python 2.4.
>
> Empty loop (overhead):
>     IP1: 422
>     IP2: 438
>     Py: 3578
>
> Create instance newstyle:
>     IP1: 20360
>     IP2:  1109
>     Py:   4063
>
> Create instance oldstyle:
>     IP1: 3766
>     IP2: 3359
>     Py:  4797
>
> Function call:
>     IP1: 937
>     IP2: 906
>     Py: 3313
>
> Create function: 25% slower
>     IP1: 2828
>     IP2: 3640
>     Py:  2766
>
> Define newstyle (1 000 000):
>     IP1: 42047
>     IP2: 20484
>     Py:  23921
>
> Define oldstyle (1 000 000): 33% slower
>     IP1: 1781
>     IP2: 2671
>     Py:  2108
>
> Comparing (== and !=):
>     IP1: 278597
>     IP2: 117662
>     Py:   62423
>
> Sets:
>     IP1: 37095
>     IP2: 30860
>     Py:   8047
>
> Lists (10 000): 50% slower
>     IP1: 10422
>     IP2: 16109
>     Py:   6094
>
> Recursion (10 000):
>     IP1: 1125
>     IP2: 1000
>     Py:  3609
>
> Sets2 (100 000): 500% slower
>     IP1:  4984
>     IP2: 30547
>     Py:   1203
>
> func_with_args:
>     IP1: 6312
>     IP2: 5906
>     Py: 11250
>
> method_with_args:
>     IP1: 20594
>     IP2: 11813
>     Py:  14875
>
> method_with_kwargs:
>     IP1: 27953
>     IP2: 11187
>     Py:  20032
>
> import: 15% slower
>     IP1: 28469
>     IP2: 32000
>     Py:  25782
>
> global: 20% slower
>     IP1: 1047
>     IP2: 1203
>     Py:  4141
>
> Exceptions (100000): 40% slower
>     IP1: 4703
>     IP2: 6125
>     Py:   266
>
> Engine execution: 8000% slower!!
>     IP1: 1600
>     IP2: 115002
>
>
> Michael Foord wrote:
>
>> Hello all,
>>
>> I've ported Resolver One to run on IronPython 2 Beta 4 to check for
>> any potential problems (we will only do a *proper* port once IP 2 is
>> out of beta).
>>
>> The basic porting was straightforward and several bugs have been fixed
>> since IP 2 B3 - many thanks to the IronPython team.
>>
>> The good news is that Resolver One is only 30-50% slower than Resolver
>> One on IronPython 1! (It was 300 - 400% slower on top of IP 2 B3.)
>> Resolver One is fairly heavily optimised around the performance
>> hotspots of IronPython 1, so we expect to have to do a fair bit of
>> profiling and refactoring to readjust to the performance profile of IP 2.
>>
>> Having said that, there are a few oddities (and the areas that slow
>> down vary tremendously depending on which spreadsheet we use to
>> benchmark it - making it fairly difficult to track down the hotspots).
>>
>> We have one particular phase of spreadsheet calculation that takes
>> 0.4seconds on IP1 and around 6 seconds on IP2, so I have been doing
>> some micro-benchmarking to try and identify the hotspot. I've
>> certainly found part of the problem.
>>
>> For those that are interested I've attached the very basic
>> microbenchmarks I've been using. The nice thing is that in *general*
>> IP2 does outperform IP1.
>>
>> The results that stand out in the other direction are:
>>
>> Using sets with custom classes (that define '__eq__', '__ne__' and
>> '__hash__') seems to be 6 times slower in IronPython 2.
>>
>> Adding lists together is about 50% slower.
>>
>> Defining functions seems to be 25% slower and defining old style
>> classes about 33% slower. (Creating instances of new style classes is
>> massively faster though - thanks!)
>>
>> The code I used to test sets (sets2.py) is as follows:
>>
>> from System import DateTime
>>
>> class Thing(object):
>>    def __init__(self, val):
>>        self.val = val
>>      def __eq__(self, other):
>>        return self.val == other.val
>>
>>    def __neq__(self):
>>        return not self.__eq__(other)
>>          def __hash__(self):
>>        return hash(self.val)
>>             def test(s):
>>    a = set()
>>    for i in xrange(100000):
>>        a.add(Thing(i))
>>        a.add(Thing(i+1))
>>        Thing(i) in a
>>        Thing(i+2) in a
>>    return (DateTime.Now -s).TotalMilliseconds
>>   s = DateTime.Now
>> print test(s)
>>
>>
>> Interestingly the time taken is exactly the same if I remove the
>> definition of '__hash__'.
>>
>> The full set of results below:
>>
>> Results in milliseconds with a granularity of about 15ms and so an
>> accuracy of +/- ~60ms.
>> All testing with 10 000 000 operations unless otherwise stated.
>>
>> Empty loop (overhead):
>>    IP1: 421.9
>>    IP2: 438
>>   Create instance newstyle:
>>    IP1: 20360
>>    IP2: 1109
>>   Create instance oldstyle:
>>    IP1: 3766
>>    IP2: 3359
>>   Function call:
>>    IP1: 937
>>    IP2: 906
>>   Create function: 25% slower
>>    IP1: 2828
>>    IP2: 3640
>>   Define newstyle (1 000 000):
>>    IP1: 42047
>>    IP2: 20484
>>   Define oldstyle (1 000 000): 33% slower
>>    IP1: 1781
>>    IP2: 2671
>>
>> Comparing (== and !=):
>>    IP1: 278597
>>    IP2: 117662
>>   Sets (with numbers):
>>    IP1: 37095
>>    IP2: 30860
>>
>> Lists (10 000): 50% slower
>>    IP1: 10422
>>    IP2: 16109
>>
>> Recursion (10 000):
>>    IP1: 1125
>>    IP2: 1000
>>
>> Sets2 (100 000): 600% slower
>>    IP1: 4984
>>    IP2: 30547
>>
>>
>> I'll be doing more as the 600% slow down for sets and the 50% slow
>> down for lists accounts for some of the dependency analysis problem
>> but not all of it.
>>
>> Many Thanks
>>
>> Michael Foord
>> --
>> http://www.resolversystems.com
>> http://www.ironpythoninaction.com
>>
>>
>>
>> ----------------------------------------------------------------------
>> --
>>
>> _______________________________________________
>> Users mailing list
>> Users@lists.ironpython.com<mailto:Users@lists.ironpython.com>
>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>
>
>
> --
> http://www.ironpythoninaction.com/
> http://www.voidspace.org.uk/
> http://www.trypython.org/
> http://www.ironpython.info/
> http://www.resolverhacks.net/
> http://www.theotherdelia.co.uk/
>
> _______________________________________________
> Users mailing list
> Users@lists.ironpython.com<mailto:Users@lists.ironpython.com>
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>


--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/
http://www.trypython.org/
http://www.ironpython.info/
http://www.theotherdelia.co.uk/
http://www.resolverhacks.net/

_______________________________________________
Users mailing list
Users@lists.ironpython.com<mailto:Users@lists.ironpython.com>
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
_______________________________________________
Users mailing list
Users@lists.ironpython.com<mailto:Users@lists.ironpython.com>
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Re: [IronPython] Performance of IronPython 2 Beta 4 and IronPython 1

Reply via email to