nice! thanks for **kw fix. that was really bugging us. looks like 2.0b5 is where we'll start seriously looking at switching to 2.0. do you know when it's expected?
On Tue, Aug 19, 2008 at 7:09 PM, Dino Viehland <[EMAIL PROTECTED]> wrote: > The fixes for these (specifically the ones I said I had fixes for :) ) are > now checked in and the next source code push (as well as Beta 5) will have > the fixes. > > In addition to these issues I've fixed the **kw perf bug recently reported > (http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=17679). > I've also attempted to remove some of the overhead of using sets - we had > an extra level of indirection that wasn't necessary and we could also reduce > some hashing (where we'd hash twice) we had to do now that we don't use .NET > dictionaries directly. It'd be interesting to see if that helps on your > machine which shows sets running slowly on or not. Finally creation of > instances w/ __slots__ was horribly slow and that's fixed as well. Here's > the code review mail if anyone's interested in the details: > > DLR: OptimizedScriptCode shouldn't be re-writing the AST every time > through. I've added a cache so we'll only re-write it once. > > Python: > Speed up dictionary creation by avoding repeated locking. > Now we new up an object array w/ the values and pass it off to create in > one action. > > List addition should leave a little bit of buffer room for > further operations, also adding fast paths for extend with well known types. > Finally switching OrderedLocker to being a struct and ensuring it handles > the case where the objects are the same. > > Set: Removing a level of indirection - we now hold directly > onto a CommonDictionaryStorage instead of going through PythonDictionary for > the same effect. I've also removed several __contains__ checks which aren't > necessary because CommonDictionaryStorage guarantees the correct semantics > (replacing an existing value only updates the keys) > > PythonFunction - speed up function creation by looking > directly in the global scope dict and creating the FunctinoCode. Lazily > creating this does no good as it's always created and the > Interlocked.CompareExchange call isn't cheap. > > Calling new-style classes no longer looks up __init__, it now just does a > TryGetBoundValue on it. > > Creating instances of types w/ __slots__ is very slow. > Switched to using an object array and we no longer require code gen for > generating the accesors. This all around simplifies the NewTypeMaker code > as well as the ReflectedSlotProperty code. > > 17679 - Performance issue when calling a function with **kw > We now go through the DictionaryStorage Clone API when we > have a Python dictionary. Also added "HasNonStringAttributes" method onto > DictionaryStorage which SymbolId dicts can optimize away. But > CommonDictionaryStorage can also optimize it by avoiding lots of locking. > Finally we now throw a reasonable exception when we don't get a dictionary. > > Also making EnvironmentDictionaryStorage not inherit from > CommonDictionaryStorage. AddNoLock was certainly not doing the right thing > and this way it's clearer what's going on there. > > > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto: > [EMAIL PROTECTED] On Behalf Of Michael Foord > Sent: Friday, August 15, 2008 11:41 AM > To: Discussion of IronPython > Subject: Re: [IronPython] Performance of IronPython 2 Beta 4 and IronPython > 1 > > Dino Viehland wrote: > > Ok, I looked into a bunch of these and here's what I've discovered so far > and other random comments... > > > > Exceptions (100000): 40% slower > > IP1: 4703 > > IP2: 6125 > > Py: 266 > > > > I haven't looked at this one yet. I do know that we have a number of bug > fixes for our exception handling which will slow it down though. I don't > consider this to be a high priority though. If we wanted to focus on > exception perf I think we'd want to do something radical rather than small > tweaks to the existing code. If there's certain scenarios where exception > perf is critical though it'd be interesting to hear about those and if we > can do anything to improve them. > > > > > I can look at this. > > > Engine execution: 8000% slower!! > > IP1: 1600 > > IP2: 115002 > > > > This is just a silly bug. We're doing a tree re-write of the AST and we > do that every time through. Caching that re-write gets us back to 1.x > performance. I have a fix for this. > > > > Great! (1.x performance was very impressive.) > > > Create function: 25% slower > > IP1: 2828 > > IP2: 3640 > > Py: 2766 > > > > Part of this is from a bug fix but the fix could be more efficient. In > 1.x we don't look up __module__ from the global scope. In 2.x we do this > lookup but it searches all scopes - which isn't even correct. But we can do > a direct lookup which is a little faster - so I have a partial fix for this. > This will still be a little slower than 1.x though. > > > > > Ok. > > > Define oldstyle (1 000 000): 33% slower > > IP1: 1781 > > IP2: 2671 > > Py: 2108 > > > > Is this critical? I'd rather just live w/ the slowness rather than > fixing something that will be gone in 3.x :) > > > > > > Not a problem for us - I merely noted it. In 1.x we needed to switch a > few classes to old style for performance reasons (but we don't > repeatedly redefine them - it was instantiation time). In 2.x we will > need to switch back (which is great). > > > Lists (10 000): 50% slower > > IP1: 10422 > > IP2: 16109 > > Py: 6094 > > > > The primary issue here is that adding 2 lists ends up creating a new list > whose storage is the exact size needed for storing the two lists. When you > append to it after adding it we need to allocate a brand new array - and > you're not dealing with small arrays here. We can add a little extra space > depending on the size of the array to minimize the chance of needing a > re-size. That gets us to about 10% slower than CPython. I'm also going to > add a strongly typed extend overload which should make those calls a little > faster. > > > > > > Python lists will typically grow to always have a lot of space. Creating > a list with no extra space seems like a problem. My benchmark for this > was unrealistic though (we add lists and extend them a lot - but > typically they're nothing like that size). > > > > Sets2 (100 000): 500% slower > > IP1: 4984 > > IP2: 30547 > > Py: 1203 > > > > This one I actually cannot repro yet (I've tried it on 3 machines but > they've all been Vista). I'm going to try next on a Srv 2k3 machine and see > if I can track it down. But more information would be useful. > > > > Hmmm... I wonder if it is an oddity with my machine. Unfortunately I am > not at work today and can't repeat it. I've just run it on Vista (.NET > 2.0.50727.3053) running under VMWare Fusion (but on a kick-arse machine). > > IP1.1.2: 3515 > IP2.0B4: 2516 > > I need to rerun the whole Resolver port on someone else's machine. > > > Comparing (== and !=): > > IP1: 278597 > > IP2: 117662 > > > > This one is actually pretty interesting (even though we're faster in 2.x) > - there's an issue with the test here. You've defined "__neq__" instead of > "__ne__". > > Ha! Oops. :-) > > That causes the != comparison to ultimately compare based upon object > identity - which is extremely slow. There might be some things we can do to > make the object identity comparison faster (For example recognizing that > we're doing equality and just need a eq or ne answer rather than a 1, -1, 0 > comparison value). But I'm going to assume comparing on object identity > isn't very important right now - let me know if I'm wrong. > We do use identity comparison a lot - but I'm not sure if it is in > performance critical parts of our code. I can review this. > > > But switching this to __ne__ causes us to be a little faster than > CPython. They have a great advantage on object identity comparisons - they > can just use the objects address. > > > > > Sure. > > > I was also curious what happens to this case if we use __slots__. That > identified yet another massive performance regression which I have a fix for > - creating instances that have __slots__ defined is horribly slow. With > that bug fixed and using slots and __ne__ instead of __neq__ we can actually > run this over 2x faster than CPython (on Vista x86 .NET 3.5SP1 on a 2.4ghz > Core 2 w/ 4gb of RAM). > > > > > Cool. > > Michael > > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto: > [EMAIL PROTECTED] On Behalf Of Michael Foord > > Sent: Thursday, August 14, 2008 9:42 AM > > To: Discussion of IronPython > > Subject: Re: [IronPython] Performance of IronPython 2 Beta 4 and > IronPython 1 > > > > Just for fun I also compared with CPython. The results are interesting, > I'll turn it into a blog post of course... > > > > Results in milliseconds with a granularity of about 15ms and so an > accuracy of +/- ~60ms. > > All testing with 10 000 000 operations unless otherwise stated. > > > > The version of Python I compared against was Python 2.4. > > > > Empty loop (overhead): > > IP1: 422 > > IP2: 438 > > Py: 3578 > > > > Create instance newstyle: > > IP1: 20360 > > IP2: 1109 > > Py: 4063 > > > > Create instance oldstyle: > > IP1: 3766 > > IP2: 3359 > > Py: 4797 > > > > Function call: > > IP1: 937 > > IP2: 906 > > Py: 3313 > > > > Create function: 25% slower > > IP1: 2828 > > IP2: 3640 > > Py: 2766 > > > > Define newstyle (1 000 000): > > IP1: 42047 > > IP2: 20484 > > Py: 23921 > > > > Define oldstyle (1 000 000): 33% slower > > IP1: 1781 > > IP2: 2671 > > Py: 2108 > > > > Comparing (== and !=): > > IP1: 278597 > > IP2: 117662 > > Py: 62423 > > > > Sets: > > IP1: 37095 > > IP2: 30860 > > Py: 8047 > > > > Lists (10 000): 50% slower > > IP1: 10422 > > IP2: 16109 > > Py: 6094 > > > > Recursion (10 000): > > IP1: 1125 > > IP2: 1000 > > Py: 3609 > > > > Sets2 (100 000): 500% slower > > IP1: 4984 > > IP2: 30547 > > Py: 1203 > > > > func_with_args: > > IP1: 6312 > > IP2: 5906 > > Py: 11250 > > > > method_with_args: > > IP1: 20594 > > IP2: 11813 > > Py: 14875 > > > > method_with_kwargs: > > IP1: 27953 > > IP2: 11187 > > Py: 20032 > > > > import: 15% slower > > IP1: 28469 > > IP2: 32000 > > Py: 25782 > > > > global: 20% slower > > IP1: 1047 > > IP2: 1203 > > Py: 4141 > > > > Exceptions (100000): 40% slower > > IP1: 4703 > > IP2: 6125 > > Py: 266 > > > > Engine execution: 8000% slower!! > > IP1: 1600 > > IP2: 115002 > > > > > > Michael Foord wrote: > > > >> Hello all, > >> > >> I've ported Resolver One to run on IronPython 2 Beta 4 to check for > >> any potential problems (we will only do a *proper* port once IP 2 is > >> out of beta). > >> > >> The basic porting was straightforward and several bugs have been fixed > >> since IP 2 B3 - many thanks to the IronPython team. > >> > >> The good news is that Resolver One is only 30-50% slower than Resolver > >> One on IronPython 1! (It was 300 - 400% slower on top of IP 2 B3.) > >> Resolver One is fairly heavily optimised around the performance > >> hotspots of IronPython 1, so we expect to have to do a fair bit of > >> profiling and refactoring to readjust to the performance profile of IP > 2. > >> > >> Having said that, there are a few oddities (and the areas that slow > >> down vary tremendously depending on which spreadsheet we use to > >> benchmark it - making it fairly difficult to track down the hotspots). > >> > >> We have one particular phase of spreadsheet calculation that takes > >> 0.4seconds on IP1 and around 6 seconds on IP2, so I have been doing > >> some micro-benchmarking to try and identify the hotspot. I've > >> certainly found part of the problem. > >> > >> For those that are interested I've attached the very basic > >> microbenchmarks I've been using. The nice thing is that in *general* > >> IP2 does outperform IP1. > >> > >> The results that stand out in the other direction are: > >> > >> Using sets with custom classes (that define '__eq__', '__ne__' and > >> '__hash__') seems to be 6 times slower in IronPython 2. > >> > >> Adding lists together is about 50% slower. > >> > >> Defining functions seems to be 25% slower and defining old style > >> classes about 33% slower. (Creating instances of new style classes is > >> massively faster though - thanks!) > >> > >> The code I used to test sets (sets2.py) is as follows: > >> > >> from System import DateTime > >> > >> class Thing(object): > >> def __init__(self, val): > >> self.val = val > >> def __eq__(self, other): > >> return self.val == other.val > >> > >> def __neq__(self): > >> return not self.__eq__(other) > >> def __hash__(self): > >> return hash(self.val) > >> def test(s): > >> a = set() > >> for i in xrange(100000): > >> a.add(Thing(i)) > >> a.add(Thing(i+1)) > >> Thing(i) in a > >> Thing(i+2) in a > >> return (DateTime.Now -s).TotalMilliseconds > >> s = DateTime.Now > >> print test(s) > >> > >> > >> Interestingly the time taken is exactly the same if I remove the > >> definition of '__hash__'. > >> > >> The full set of results below: > >> > >> Results in milliseconds with a granularity of about 15ms and so an > >> accuracy of +/- ~60ms. > >> All testing with 10 000 000 operations unless otherwise stated. > >> > >> Empty loop (overhead): > >> IP1: 421.9 > >> IP2: 438 > >> Create instance newstyle: > >> IP1: 20360 > >> IP2: 1109 > >> Create instance oldstyle: > >> IP1: 3766 > >> IP2: 3359 > >> Function call: > >> IP1: 937 > >> IP2: 906 > >> Create function: 25% slower > >> IP1: 2828 > >> IP2: 3640 > >> Define newstyle (1 000 000): > >> IP1: 42047 > >> IP2: 20484 > >> Define oldstyle (1 000 000): 33% slower > >> IP1: 1781 > >> IP2: 2671 > >> > >> Comparing (== and !=): > >> IP1: 278597 > >> IP2: 117662 > >> Sets (with numbers): > >> IP1: 37095 > >> IP2: 30860 > >> > >> Lists (10 000): 50% slower > >> IP1: 10422 > >> IP2: 16109 > >> > >> Recursion (10 000): > >> IP1: 1125 > >> IP2: 1000 > >> > >> Sets2 (100 000): 600% slower > >> IP1: 4984 > >> IP2: 30547 > >> > >> > >> I'll be doing more as the 600% slow down for sets and the 50% slow > >> down for lists accounts for some of the dependency analysis problem > >> but not all of it. > >> > >> Many Thanks > >> > >> Michael Foord > >> -- > >> http://www.resolversystems.com > >> http://www.ironpythoninaction.com > >> > >> > >> > >> ---------------------------------------------------------------------- > >> -- > >> > >> _______________________________________________ > >> Users mailing list > >> [email protected] > >> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > >> > > > > > > -- > > http://www.ironpythoninaction.com/ > > http://www.voidspace.org.uk/ > > http://www.trypython.org/ > > http://www.ironpython.info/ > > http://www.resolverhacks.net/ > > http://www.theotherdelia.co.uk/ > > > > _______________________________________________ > > Users mailing list > > [email protected] > > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > > > > > -- > http://www.ironpythoninaction.com/ > http://www.voidspace.org.uk/ > http://www.trypython.org/ > http://www.ironpython.info/ > http://www.theotherdelia.co.uk/ > http://www.resolverhacks.net/ > > _______________________________________________ > Users mailing list > [email protected] > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > _______________________________________________ > Users mailing list > [email protected] > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com >
_______________________________________________ Users mailing list [email protected] http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
