Ian Bicking wrote:
Remi Delon wrote:

I'm wondering -- and this is mostly directed to the hosting providers (Remi, Sean...) -- what are the problems with providing commodity-level hosting for Python programs? I can think of some, but I'm curious what you've encountered and if you have ideas about how to improve things.

Some things I've thought about:
* Long running processes are hard to maintain (assuming we rule out CGI). Code becomes stale, maybe the server process gets in a bad state. Sometimes processes becomes wedged. With mod_python this can effect the entire site.



Yes, maintaining long-running processes can be a pain, but that's not related to python itself, it's true regardless of the language that was used to write the program.


* Isolating clients from each other can be difficult. For mod_python I'm assuming each client needs their own Apache server.



Yes, that's how we ended up setting up our mod_python accounts.
We also found stability problems in some of the other mod_* modules (mod_webkit, mod_skunkweb, ...) and they sometimes crashed the main Apache server (very bad). So for all the frameworks that support a standalone HTTP server mode (CherryPy, Webware, Skunkweb, ...) we now set them up as standalone HTTP server listening on a local port, and we just use our main Apache server as a proxy to these servers.
This allows us to use the trick described on this page: http://www.cherrypy.org/wiki/BehindApache (look for "autostart.cgi") to have Apache restart the server automatically if it ever goes down.


On our own servers we've been using CGI connectors (wkcgi, Zope.cgi), which seem fast enough, and of course won't be crashing Apache.

Yeah, but we wanted a somewhat "standard" way of talking to Apache and most frameworks do come with a small HTTP server, so that works fine for us and it also completely isolates the process from Apache.

Have you looked at Supervisor for long running processes?
http://www.plope.com/software/supervisor/
I haven't had a chance to use it, but it looks useful for this sort of thing.

Well, there are several such supervising tools (daemontools is another one), but again, they never matched our exact needs. For instance, sometimes it's OK if a process is down ... it could just be that the user is working on his site. And also, they usually only watch one thing: make sure that the process stays up, but there are a million other things we wanted to watch for. So we just wrote our own scripts.


HTTP does seem like a reasonable way to communicate between servers, instead of all these ad hoc HTTP-like protocols (PCGI, SCGI, FastCGI, mod_webkit, etc). My only disappointment with that technique is that you lose some context -- e.g., if REMOTE_USER is set, or SCRIPT_NAME/PATH_INFO (you probably have to configure your URLs, since they aren't detectable), mod_rewrite's additional environmental variables, etc. Hmm... I notice you use custom headers for that (CP-Location), and I suppose other variables could also be passed through... it's just unfortunate because that significantly adds to the Apache configuration, which is something I try to avoid -- it's easy enough to put in place, but hard to maintain.

The CP-Location trick is not needed (I should remove it from this page
as it confuses people).
Have a look at the section called "What are the drawbacks of running
CherryPy behind Apache ?" on this page:
http://www.cherrypy.org/wiki/CherryPyProductionSetup
It summarizes my view on this (basically, there aren't any real drawbacks if you're using mod_rewrite with Apache2).



Maybe this isn't as much of a problem these days, as virtualizing technologies have improved, and multiple Apache processes isn't that big of a deal.
* Setup of frameworks is all over the place. Setting up multiple frameworks might be even more difficult. Some of them may depend on mod_rewrite. Server processes are all over the place as well.


But I don't have a real feeling for how to solve these, and I'm sure there's things I'm not thinking about.

Well, the 2 main problems that I can think of are:
- Python frameworks tend to work as long-running processes, which have a lot of advantages for your site, but are a nightmare for hosting providers. There are soooo many things to watch for: CPU usage (a process can start "spinning"), RAM usage, process crashing, ... But that is not related to python and any hosting provider that supports long-running processes face the same challenge. For instance, we support Tomcat and the problems are the same. For this we ended up writing a lot of custom monitoring scripts on our own (we couldn't find exactly what we needed out there). Fortunately, python makes it easy to write these scripts :-)


Do you do monitoring on a per-process basis (like a supervisor process) or just globally scan through the processes and kill off any bad ones?

We monitor the general health of our servers on various levels and we
monitor the response time of some key sites/services on each of our servers to make sure that overall the server is OK.
For each individual site of our customers, we only have scripts that try
to restart the sites if they ever go down, but that's it (if the
customer changed their site and broke it, there isn't much we can do
about it).


I've though that a forking server with a parent that monitored children carefully would be nice, which would be kind of a per-process monitor. It would mean I'd have to start thinking multiprocess, reversing all my threaded habits, but I think I'm willing to do that in return for really good reliability.

I'm still very much on the "thread pool" camp :-)
I've got CherryPy sites that run in a thread pool mode for months without any stability or memory leak problem.
If your process crashes or leaks memory then there's something wrong with your program in the first place, and the right way to solve it is not to switch to a multiprocess model.
Finally, if you want a monitoring process, it can be a completely separate process which allows you to still keep a "thread pool" model for your main process.


- But another challenge (and this one is more specific to Python) is the number of python versions and third party modules that we have to support. For instance, at Python-Hosting.com, we have to support all 4 versions of python: 2.1, 2.2, 2.3 and 2.4, and all of them are being used by various people. And for each version, we usually have 10 to 20 third-party modules (mysql-python, psycopg, elementtree, sqlobject, ...) that people need ! We run Red Hat Enterprise 3, but RPMs for python are not designed to work with multiple python versions installed, and RPMs for third-party modules are usually inexistent. As a result, we have to build all the python-related stuff from source. And some of these modules are sometimes hard to build (the python-subversion bindings for instance) and you can run into some library-version-compatibility nightmare. And as if this wasn't enough, new releases of modules come out everyday ...


For the apps I've been deploying internally -- where we have both a more controlled and less controlled environment than a commercial host -- I've been installing every prerequesite in a per-application location, i.e., ``python setup.py install --install-lib=app/stdlib``. Python module versioning issues are just too hard to resolve, and I'd rather leave standard-packages with only really stable software that I don't often need to update (like mxDateTime), and put everything else next to the application.

Well, we have a mix of both: for all "more or less common" modules, we install them system-wide. If someone wants a really "esoteric" module that noone else on the server is likely to use, we usually tell them to install it in their home directory.


I think that this second point is the main challenge and any hosting provider that is not specialized in python doesn't have the time or the knowledge to build and maintain all these python versions and third-party modules. Of course, they could just say "we're going to support this specific python version with these few third-party modules and that's it", but experience shows that most people need at least one or 2 "uncommon" third-party modules for their site so if that module is missing they just can't run their site ...

Any reason for all the Python versions? Well, I guess it's hard to ask clients to upgrade. If I was to support people in that way, I'd probably try to standardize a Python version or two, and some core modules (probably the ones that are harder to build, like database drivers), and ask users to install everything else in their own environment. But of course when you are in service you have to do what people want you to do...

Well, we very much decide what software/version we support based on customer demand ... If enough people want python 2.1, 2.2, 2.3 and 2.4 (which is the case right now), then we support all of them ...
Recently there was a high demand for a commercial Trac/Subversion hosting with backups and HTTPS access, so we came up with such an offer and it turned out to be quite successful.


But above all, I think that the main reason why python frameworks are not more commonly supported by the big hosting providers is because the market for these frameworks is very small (apart from Zope/Plone). For all the "smaller" frameworks (CherryPy, Webware, SkunkWeb, Quixote, ...) we host less than 50 of each, so the big hosting providers simply won't bother learning these frameworks and supporting them for such a small market.

If they could support all of them at once, do you think it would be more interesting to hosting providers?

Well, if all frameworks came in nicely packaged RPMs and they all integrated the same way with Apache (mod_wsgi anyone ?) I guess that would be a big step forward ... But you'd still have the problem of all the python third-party modules that people need ...


Remi.


_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to