See the forwarded message. I just added the following in the buildout
section of my ~/.buildout/default.cfg:
index = http://download.zope.org/ppix
Without it, refreshing a small buildout of mine takes 2m44s. With it,
it takes about 15 seconds.
Jim
Begin forwarded message:
From: Jim Fulton <[EMAIL PROTECTED]>
Date: July 19, 2007 7:06:34 AM EDT
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Prototype setuptools-specific PyPI index.
Over the past few months, we've struggled quite a bit with Python
Package Index (PyPI) performance and stability. Thanks to the
heroic efforts of Martin v. Löwis and others, performance and
especially stability have improved quite a bit. Martin has
demonstrated that, at least when running well, PyPI seems to answer
most requests on the order of 7 miliseconds (around 150 requests
per second) internally. That's not bad. Unfortunately for users,
actual times can be quite a bit longer. For me at work, request
take around 300 milliseconds. For Martin, they seem to take
somewhat longer. 300 milliseconds isn't so bad for a request or
two, however, easy install can easily make 10s or even hundreds of
requests to satisfy a user request for a package. zc.buildout,
when verifying that a large system with many tens of packages has
the most up to date versions of each package can easily make
thousands of requests.
Why do setuptools and buildout make so many requests? If a package
exposes more than one release, then setuptools checks the package's
main PyPI page and the pages for each release. We need to be able
to easily use older releases, so we can't hide old releases.
Typical projects of ours have many old releases exposed. If
setuptools was more clever in the way it searched PyPI, but it
would still have to make a minimum of 2 requests per package for
packages with multiple versions exposed.
Another potential issue is that PyPI pages can be large. I've
found it convenient to use PyPI package pages as the home page for
many of my projects. I like to include package documentation in my
project pages. Perhaps this is an abuse of PyPI, but it is very
convenient for me and no one has complained. :) The zc.buildout
pages are around 200K. That's a fair bit of data for setuptools to
download and scan for download URLs.
In the course of this discussion, I've realized that it doesn't
make sense for setuptools to use the same interface that humans
use. setuptools doesn't need to see all of the data that is useful
to humans. Similarly, humans generally don't need to see all of the
historical releases for a project. I suggested a simple page
format designed just for setuptools. An alternative would be an
xmlrpc API. I prefer pages because I think that, over time, the
amount of requests from automated tools like easy_install and
zc.buildout will increase substantially and ultimately, will
overwhelm dynamic servers, even ones like PyPI that are reasonably
fast. I also think that a simple static collection of pages will
be easier to mirror and I think some number of geographic mirrors
is likely to help some people. I promised to prototype the format
I suggested.
I've created and experimental prototype setuptools-specific package
index at
http://download.zope.org/ppix
Going to that page gives brief instructions for using it with
easy_install and zc.buildout. To see an individual package page,
add the package name to the URL, as in:
http://download.zope.org/ppix/setuptools/
A few things to note about this:
- I don't expose a long package list at http://download.zope.org/
ppix/. The long package list would be expensive to download and
supports a use case that I consider to be of negative value, which
is installing packages with case-insensitive package names, I
think it is important for humans to be able to search for packages
using case-insensitive search terms, but I think that, after
identifying a package, precise package names should be used. I
think it is especially important that precise package names be used
in package requirements.
- There is a single page per package. This can greatly reduce the
number of requests. Packages that store all of their distributions
in PyPI and that don't have off-site home pages or download URLs
can be scanned with a single request. Note that I excluded home
page and download URLs that pointed back to the packages PyPI page,
as that wouldn't provide any new information to setuptools.
- Download URLs for *hidden* packages are included. Humans don't
need to see old revisions, but setuptools-based tools do. If we
used an index like this for setuptools, we could stop unhiding old
releases when we created new releases in PyPI. This would make
PyPI more useful to humans and less of a pain for developers.
- Download URLs are the same as they are in PyPI. Using this new
index, distributions are still downloaded from PyPI, so the index
doesn't affect PyPI download statistics.
To see the impact of this, it's interesting to look at installing
zc.buildout using easy_install from PyPI and from the experimental
index:
Installing using PyPI looks like this:
(env)[EMAIL PROTECTED]:~/tmp$ time easy_install zc.buildout
Searching for zc.buildout
Reading http://cheeseshop.python.org/pypi/zc.buildout/
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b19
Reading http://svn.zope.org/zc.buildout
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b22
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b23
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b20
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b21
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b26
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b27
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b24
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b25
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b28
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b17
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b16
Reading http://cheeseshop.python.org/pypi/zc.buildout/1.0.0b18
Best match: zc.buildout 1.0.0b28
Downloading http://cheeseshop.python.org/packages/2.5/z/
zc.buildout/zc.buildout-1.0.0b28-
py2.5.egg#md5=4e37e53f010ed7984555a029732f479d
Processing zc.buildout-1.0.0b28-py2.5.egg
creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
py2.5.egg
Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/
lib/python2.5
Adding zc.buildout 1.0.0b28 to easy-install.pth file
Installing buildout script to /home/jim/tmp/env/bin/
Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
py2.5.egg
Processing dependencies for zc.buildout
Searching for setuptools==0.6c6
Best match: setuptools 0.6c6
Processing setuptools-0.6c6-py2.5.egg
Adding setuptools 0.6c6 to easy-install.pth file
Installing easy_install script to /home/jim/tmp/env/bin/
Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6-py2.5.egg
Processing dependencies for setuptools==0.6c6
Finished processing dependencies for setuptools==0.6c6
Finished installing setuptools==0.6c6
Finished processing dependencies for zc.buildout
Finished installing zc.buildout
real 0m31.360s
user 0m1.136s
sys 0m0.060s
Note the large number of pages read. Here I was installing a
single package with one dependency, setuptools, that was already
installed. Let's look at this again using the experimental index:
(env)[EMAIL PROTECTED]:~/tmp$ time easy_install -i http://download.zope.org/
ppix zc.buildout
Searching for zc.buildout
Reading http://download.zope.org/ppix/zc.buildout/
Best match: zc.buildout 1.0.0b28
Downloading http://cheeseshop.python.org/packages/2.5/z/
zc.buildout/zc.buildout-1.0.0b28-
py2.5.egg#md5=4e37e53f010ed7984555a029732f479d
Processing zc.buildout-1.0.0b28-py2.5.egg
creating /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
py2.5.egg
Extracting zc.buildout-1.0.0b28-py2.5.egg to /home/jim/tmp/env/
lib/python2.5
Adding zc.buildout 1.0.0b28 to easy-install.pth file
Installing buildout script to /home/jim/tmp/env/bin/
Installed /home/jim/tmp/env/lib/python2.5/zc.buildout-1.0.0b28-
py2.5.egg
Processing dependencies for zc.buildout
Searching for setuptools==0.6c6
Best match: setuptools 0.6c6
Processing setuptools-0.6c6-py2.5.egg
Adding setuptools 0.6c6 to easy-install.pth file
Installing easy_install script to /home/jim/tmp/env/bin/
Installing easy_install-2.5 script to /home/jim/tmp/env/bin/
Installed /home/jim/tmp/env/lib/python2.5/setuptools-0.6c6-py2.5.egg
Processing dependencies for setuptools==0.6c6
Finished processing dependencies for setuptools==0.6c6
Finished installing setuptools==0.6c6
Finished processing dependencies for zc.buildout
Finished installing zc.buildout
real 0m7.006s
user 0m0.244s
sys 0m0.040s
Note:
- We made far fewer requests with the new index
- Most of the time in the second example was spent actually
downloading the buildout distribution. Most of the time in the
first example was spent reading the index.
- I used workingenv to create clean environments for each of the
examples above.
WRT zc.buildout, refreshing a buildout with just ZODB installed in
it takes about 45 seconds for me using PyPI and about 5 seconds
using the experimental index.
Some of the speed improvements is due to the fact that the
experimental index is much closer to me (on the net) than PyPI.
ATM, requests to PyPI take *me* around 500 milliseconds, while
requests to the experimental index are taking between 100 and 300
milliseconds. (I'm at home and this seems to be somewhat
variable.) Most of the speed improvements are from reducing the
number of requests.
I'm polling PyPI once a minute to get and apply updates. Thanks to
the new XML-RPC method that Martin added, this is very efficient to
do.
I encourage people to check this out and even try using it with
easy_install and especially buildout. AFAIK, aside from being much
faster and showing download files for hidden releases it is
completely equivalent to PyPI for setuptools use. My intension is
to keep this experimental index going and up to date for the
foreseeable future and plan to use it for all my work.
My primary goal is to prototype the new index format. If this
seems useful, then I think that www.python.org should expose an
index in this format to setuptools, either at a different URL or by
satisfying setuptools requests from the index based on client
information. I'd love to see this index populated via a baking
mechanism that updates package pages when they change, rather than
through polling as I'm doing.
There would be some benefit to having geographic mirrors. I
suspect that having such mirrors available would improve
performance further, at least for some folks. It might also be
useful to have some mirrors for redundancy purposes. Note though
that what I'm doing is mirroring the only index data. I'm not
mirroring distributions. Of course, I'd be happy to make my
software available. (It already is via our subversion repository.)
I hope this effort spurs useful discussion and progress.
Jim
--
Jim Fulton mailto:[EMAIL PROTECTED] Python
Powered!
CTO (540) 361-1714
http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org
--
Jim Fulton mailto:[EMAIL PROTECTED] Python
Powered!
CTO (540) 361-1714
http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org
_______________________________________________
Zope3-dev mailing list
[email protected]
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com