Hello,

I have been working to track down the origin of the performance penalty
exposed by this bug.

All the tests that I am performing are made on top of a locally compiled 
version of python 2.7.12 (from upstream sources, not applying any ubuntu patch 
on it)
built with different versions of GCC, 5.3.1 (current) and 4.8.0 both coming 
from the Ubuntu archives.

I can see important performance differences as I mentioned on my previous 
comments (check the full comparisons stats) just by
switching the GCC version. I decided to focus my investigation on the pickle 
module, since it seems to be the most affected one being
approximately 1.17x slower between the different gcc versions.

Due to the amount of changes introduced between 4.8.0 and 5.3.1 I decided to 
not persue the approach
of doing a bisection of the changes for identifying an offending commit yet, 
until we can identify which optimization or change
at compile time is causing the regression and focus our investigation on that 
specific area.

My understanding is that the performance penalty caused by the compiler might 
be related
to 2 factors, a important change on the linked libc or a optimization made by 
the compiler in the resulting object. 

Since the resulting objects are linked against the same glibc version 2.23, I 
will not consider that factor as part of the analysis,
instead I will focus on analyzing the performance of the resulting objects 
generated by the compiler.

For following this approach I ran the pyperformance suite and used a valgrind 
session excluding all the modules with the exception of the pickle module, 
using the default supressions to avoid missing any reference in the python 
runtime with the following arguments:

valgrind --tool=callgrind --instr-atstart=no --trace-children=yes
venv/cpython2.7-6ed9b6df9cd4/bin/python -m performance run --python
/usr/local/bin/python2.7 -b pickle --inside-venv

I did run this process multiple times with both GCC 4.8.0 and 5.3.1  to produce 
a large set of callgrind files to analyze , those callgrind files contains the 
full tree of execution 
including all the relocations, jumps, calls to the libc and the python runtime 
itself and of course time spent per function and the amount of calls made to it.

I cleaned out all the resulting callgrind files removing the files smaller than 
100k and the ones that were not loading the cPickle
extension (https://pastebin.canonical.com/175951/). 

Over that set of files I executed callgrind_annotate to generate the stats per 
function ordered by the exclusive cost of function, 
Then with this script (http://paste.ubuntu.com/23795048/
) I added all the costs per function per GCC version (4.8 and 5.3.1) and then I 
calculated the variance in cost between them.

The resulting file contains a tuple with the following format:

function name - gcc 4.8 cost - gcc 5.3.1 cost - variance in percent

As an example:

/home/ubuntu/python/cpython/Objects/tupleobject.c:tupleiter_dealloc 
258068.000000 445009.000000 (variance: 0.724387)
/home/ubuntu/python/cpython/Objects/object.c:try_3way_compare 984860.000000 
1676351.000000 (variance: 0.702121)
/home/ubuntu/python/cpython/Python/marshal.c:r_object 183524.000000 
27742.000000 (variance: -0.848837)

The full results can be located here sorted by variance in descending
order http://paste.ubuntu.com/23795023/

Now that we have these results we can move forward comparing the generated code 
for the functions with bigger variance 
and track which optimization done by GCC might be altering the resulting 
objects.

I will update this case after further investigation.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to python2.7 in Ubuntu.
https://bugs.launchpad.net/bugs/1638695

Title:
  Python 2.7.12 performance regression

Status in python2.7 package in Ubuntu:
  Confirmed

Bug description:
  I work on the OpenStack-Ansible project and we've noticed that testing
  jobs on 16.04 take quite a bit longer to complete than on 14.04.  They
  complete within an hour on 14.04 but they normally take 90 minutes or
  more on 16.04.  We use the same version of Ansible with both versions
  of Ubuntu.

  After more digging, I tested python performance (using the
  'performance' module) on 14.04 (2.7.6) and on 16.04 (2.7.12).  There
  is a significant performance difference between each version of
  python.  That is detailed in a spreadsheet[0].

  I began using perf to dig into the differences when running the python
  performance module and when using Ansible playbooks.  CPU migrations
  (as measured by perf) are doubled in Ubuntu 16.04 when running the
  same python workloads.

  I tried changing some of the kerne.sched sysctl configurables but they
  had very little effect on the results.

  I compiled python 2.7.12 from source on 14.04 and found the
  performance to be unchanged there.  I'm not entirely sure where the
  problem might be now.

  We also have a bug open in OpenStack-Ansible[1] that provides
  additional detail. Thanks in advance for any help you can provide!

  [0] 
https://docs.google.com/spreadsheets/d/18MmptS_DAd1YP3OhHWQqLYVA9spC3xLt4PS3STI6tds/edit?usp=sharing
  [1] https://bugs.launchpad.net/openstack-ansible/+bug/1637494

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1638695/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to