Re: [Web-SIG] A Python Web Application Package and Format

Alice Bevan–McGregor Fri, 15 Apr 2011 12:06:01 -0700

On 2011-04-14 10:34:59 -0700, Ian Bicking said:

I think there's a general concept we should have, which I'll call a"script" -- but basically it's a script to run (__main__-style), acallable to call (module:name), or a URL to fetch internally.

Agreed. The reference notation I mentioned in my reply to Graham, withthe addition of URI syntax, covers all of those options.

I want to keep this distinct from anything long-running, which is amuch more complex deal.

The primary application is only potentially long-running. (You could,in theory, deploy an app as CGI, but that way lies madness.) However,the reference syntax mentioned (excepting URL) works well foridentifying this.

I think given the three options, and for general simplicity, the scriptcan be successful or have an error (for Python code: exception or no;for __main__: zero exit code or no; for a URL: 2xx code or no), and canreturn some text (which may only be informational, not structured?)

For the simple cases (script / callable), it's pretty easy to trapSTDOUT and STDERR, deliver INFO log messages to STDOUT, everything elseto STDERR, then display that to the administrator in some form. Samefor HTTP, except that it can include full HTML formatting information.

An application configuration could refer to scripts under differentnames, to be invoked at different stages.

A la the already mentioned post-install, pre-upgrade, post-upgrade,pre-removal, and cron-like. Any others?

There could be an optional self-test script, where the applicationcould do a last self-check -- import whatever it wanted, check dbsettings, etc. Of course we'd want to know what it needed *before* theself-check to try to provide it, but double-checking is of course goodtoo.

Unit and functional tests are the most obvious. In which case we'llneed to be able to provide a localhost-only 'mounted' location for theapplication even though it hasn't been installed yet.

One advantage to a separate script instead of just onescript-on-install is that you can more easily indicate *why* theinstallation failed. For instance, script-on-install might failbecause it can't create the database tables it needs, which is adifferent kind of error than a library not being installed, or beingfundamentally incompatible with the container it is in. In some sensemaybe that's because we aren't proposing a rich error system -- butrealistically a lot of these errors will be TypeError, ImportError,etc., and trying to normalize those errors to some richer meaning isunlikely to be done effectively (especially since error cases are hardto test, since they are the things you weren't expecting).

Humans are potentially better at reading tracebacks than machines are,so my previous logging idea (script output stored and displayed to theadministrator in a readable form) combined with a modicum of reasonableexception handling within the script should lead to fairly clear errors.

Categorizing services seems unnecessary.

The description of the different database options were forillustration, not actual separation and categorization.

I'd like to see maybe an | operator, and a distinction between requiredand optional services. E.g.:


No need for some new operator, YAML already supports lists.

services:
        - [mysql, postgresql, dburl]

Or:

services:
        required:
                - files

        optional:
                - [mysql, postgresql]

And then there's a lot more you could do... which one do you prefer,for instance.

The order of services within one of these lists would indicatepreference, thus MySQL is preferred over PostgreSQL in the secondexample, above.

Tricky things:
- You need something funny like multiple databases. This is veryservice-specific anyway, and there might sometimes need to be a way toconfigure the service. It's also a fairly obscure need.

I'm not convinced that connecting to a legacy database /and/ currentdatabase is that obscure. It's also not as hard as Django makes itlook (with a 1M SLoC change to add support)… WebCore added support inthree lines.

- You need multiple applications to share data. This is hard, not surehow to handle it. Maybe punt for now.


That's what higher-level APIs are for. ;)

You mean, the application provides its own HTTP server? I certainlywouldn't expect that...?

Nor would I; running an HTTP server would be daft. Running mod_wsgi,FastCGI on-disk sockets, or other persistent connector makes far moresense, and is what I plan.

Unless you have a very, very specific need (i.e. Tornado), running aPython HTTP server in production then HTTP proxying to it isinefficient and a terrible idea. (Easy deployment model, terribleoverhead/performance.)

Anyway, in terms of aggregate, I mean something like a "site" that ismade up of many "applications", and maybe those applications areinterdependent in some fashion. That adds lots of complications, andthough there's lots of use cases for that I think it's easier to thinkin terms apps as simpler building blocks for now.

That's not complicated at all; I do those types of aggregate sitesfairly regularly. E.g.


/ - CMS
/location - Location & image database.
/resource - Business database.
/admin - Flex administration interface.

That's done at the Nginx/Apache level, where it's most efficient to doso, not in Python.

Sure; these would be tool options, and if you set everything up you arerequiring the deployer to invoke the tools correctly to get everythingin place. Which is a fine starting point before formalizing anything.

What? Not even close—the person deploying an application is relying onthe application server/service to configure the web server of choice;there is no need for deployer action after the initial "Nginx, includeall .conf files from folder X" where folder X is managed by the appserver. (That's one line in /etc/nginx/nginx.conf.)

Hm... I guess this is an ordering question. You could import loggingand setup defaults, but that doesn't give the container a chance tooverwrite those defaults. You could have the container setup logging,then make sure the app sets defaults only when the container hasn't --but I'm not sure if it's easy to use the logging module that way.

The logging configuration, in dict form, is passed from the app serverto the container. The default logging levels are read by the appserver from the container. It's trivially easy, esp. when INI and YAMLfiles can be programatically created.

Well, maybe that's not hard -- if you have something likesilvercustomize.py that is always imported, and imported fairly earlyon, then have the container overwrite logging settings before it *does*anything (e.g., sends a request) then you should be okay?


Indeed; container-setup.py or whatever.

Rich configurations are problematic in their own ways. While thestr-key/str-value of os.environ is somewhat limited, I wouldn't wantanything richer than JSON (list, dict, str, numbers, bools).

JSON is a subset of YAML. I honestly believe YAML meets therequirements for richness, simplicity, flexibility, and portabilitythat a configuration format really needs.

And then we have to figure out a place to drop the configuration. Because we are configuring the *process*, not a particular applicationor request handler, a callable isn't great (unless we expect thecallable to drop the config somewhere and other things to pick it up?)

I've already mentioned an environment variable identifying the path tothe on-disk configuration file—APP_CONFIG_PATH—which would then be readin and acted upon by the container-setup.py file which is initiallyimported before the rest of the application. Also, the applicationfactory idea of passing the already read-in configuration dictionary isquite handy, here.

I found at least giving one valid hostname (and yes, should include apath) was important for many applications. E.g., a bunch of apps havetendencies to put hostnames in the database.


Luckily, that's a bad habit we can discourage.  ;)

I'm not psyched about pointing to a file, though I guess it could work-- it's another kind of peculiardrop-the-config-somewhere-and-wait-for-someone-to-pick-it-up. At leastdropping it directly in os.environ is easy to use directly (many thingsallow os.environ interpolation already) and doesn't require anytemporary files. Maybe there's a middle ground.

Picked up by the container-setup.py site-customize script. What's thelimit on the size of a variable in the environ? (Also, that memorygets permanently allocated for the life of the application; not veryefficient if we're just going to convert it to a rich internalstructure.)

:: Application (package) name.
This doesn't seem meaningful to me -- there's no need for a one-to-onemapping between these applications and a particular package. Unlessyou mean some attempt at a unique name that can be used for indexing?

You're mixing something up, here. Each application is a single primarypackage with dependencies. One container per application.

It would also need a way to specify things like what port to run on


Automatically allocated by the app server.

public or private interface


Chosen by the deployer during deployment time configuration.

maybe indicate if something like what proxying is valid (if any)

If it's WSGI, it's irrelevant. If it's a network service, it shouldn'tbe HTTP.

maybe process management parameters

For WSGI apps, it's transparent. Each app server would have its ownpreference (e.g. mine will prefer FastCGI on-disk sockets) and theapplication will be blissfully unaware of that.

ways to inspect the process itself (since *maybe* you can't sendinternal HTTP requests into it), etc.


Interesting idea, not sure how that would be implemented or used, though.

PHP! ;)


PHP can be deployed as a WSGI application.  :P

I'm not personally that happy with how App Engine does it, as anexample -- it requires a regex-based dispatch.

Regex dispatch is terrible. (I've actually encountered Python's 56KiBregular expression size limit on one project!) Simply exportingfolders as "top level" webroots would be sufficient, methinks.

Anything "string-like" or otherwise fancy requires more supportlibraries for the application to actually be able to make use of theenvironment. Maybe necessary, but it should be done with greatreluctance IMHO.

I've had great success with string-likes in WebCore/Marrow andTurboMail for things like e-mail address lists, e-mail addresses, andURLs.


        — Alice.


_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Re: [Web-SIG] A Python Web Application Package and Format

Reply via email to