On 2011-04-14 10:34:59 -0700, Ian Bicking said:

I think there's a general concept we should have, which I'll call a "script" -- but basically it's a script to run (__main__-style), a callable to call (module:name), or a URL to fetch internally.

Agreed. The reference notation I mentioned in my reply to Graham, with the addition of URI syntax, covers all of those options.

I want to keep this distinct from anything long-running, which is a much more complex deal.

The primary application is only potentially long-running. (You could, in theory, deploy an app as CGI, but that way lies madness.) However, the reference syntax mentioned (excepting URL) works well for identifying this.

I think given the three options, and for general simplicity, the script can be successful or have an error (for Python code: exception or no; for __main__: zero exit code or no; for a URL: 2xx code or no), and can return some text (which may only be informational, not structured?)

For the simple cases (script / callable), it's pretty easy to trap STDOUT and STDERR, deliver INFO log messages to STDOUT, everything else to STDERR, then display that to the administrator in some form. Same for HTTP, except that it can include full HTML formatting information.

An application configuration could refer to scripts under different names, to be invoked at different stages.

A la the already mentioned post-install, pre-upgrade, post-upgrade, pre-removal, and cron-like. Any others?

There could be an optional self-test script, where the application could do a last self-check -- import whatever it wanted, check db settings, etc.  Of course we'd want to know what it needed *before* the self-check to try to provide it, but double-checking is of course good too.

Unit and functional tests are the most obvious. In which case we'll need to be able to provide a localhost-only 'mounted' location for the application even though it hasn't been installed yet.

One advantage to a separate script instead of just one script-on-install is that you can more easily indicate *why* the installation failed.  For instance, script-on-install might fail because it can't create the database tables it needs, which is a different kind of error than a library not being installed, or being fundamentally incompatible with the container it is in.  In some sense maybe that's because we aren't proposing a rich error system -- but realistically a lot of these errors will be TypeError, ImportError, etc., and trying to normalize those errors to some richer meaning is unlikely to be done effectively (especially since error cases are hard to test, since they are the things you weren't expecting).

Humans are potentially better at reading tracebacks than machines are, so my previous logging idea (script output stored and displayed to the administrator in a readable form) combined with a modicum of reasonable exception handling within the script should lead to fairly clear errors.

Categorizing services seems unnecessary.

The description of the different database options were for illustration, not actual separation and categorization.

I'd like to see maybe an | operator, and a distinction between required and optional services.  E.g.:

No need for some new operator, YAML already supports lists.

services:
        - [mysql, postgresql, dburl]

Or:

services:
        required:
                - files

        optional:
                - [mysql, postgresql]

And then there's a lot more you could do... which one do you prefer, for instance.

The order of services within one of these lists would indicate preference, thus MySQL is preferred over PostgreSQL in the second example, above.

Tricky things:
- You need something funny like multiple databases.  This is very service-specific anyway, and there might sometimes need to be a way to configure the service.  It's also a fairly obscure need.

I'm not convinced that connecting to a legacy database /and/ current database is that obscure. It's also not as hard as Django makes it look (with a 1M SLoC change to add support)… WebCore added support in three lines.

- You need multiple applications to share data.  This is hard, not sure how to handle it.  Maybe punt for now.

That's what higher-level APIs are for. ;)

You mean, the application provides its own HTTP server?  I certainly wouldn't expect that...?

Nor would I; running an HTTP server would be daft. Running mod_wsgi, FastCGI on-disk sockets, or other persistent connector makes far more sense, and is what I plan.

Unless you have a very, very specific need (i.e. Tornado), running a Python HTTP server in production then HTTP proxying to it is inefficient and a terrible idea. (Easy deployment model, terrible overhead/performance.)

Anyway, in terms of aggregate, I mean something like a "site" that is made up of many "applications", and maybe those applications are interdependent in some fashion.  That adds lots of complications, and though there's lots of use cases for that I think it's easier to think in terms apps as simpler building blocks for now.

That's not complicated at all; I do those types of aggregate sites fairly regularly. E.g.

/ - CMS
/location - Location & image database.
/resource - Business database.
/admin - Flex administration interface.

That's done at the Nginx/Apache level, where it's most efficient to do so, not in Python.

Sure; these would be tool options, and if you set everything up you are requiring the deployer to invoke the tools correctly to get everything in place.  Which is a fine starting point before formalizing anything.

What? Not even close—the person deploying an application is relying on the application server/service to configure the web server of choice; there is no need for deployer action after the initial "Nginx, include all .conf files from folder X" where folder X is managed by the app server. (That's one line in /etc/nginx/nginx.conf.)

Hm... I guess this is an ordering question.  You could import logging and setup defaults, but that doesn't give the container a chance to overwrite those defaults.  You could have the container setup logging, then make sure the app sets defaults only when the container hasn't -- but I'm not sure if it's easy to use the logging module that way.

The logging configuration, in dict form, is passed from the app server to the container. The default logging levels are read by the app server from the container. It's trivially easy, esp. when INI and YAML files can be programatically created.

Well, maybe that's not hard -- if you have something like silvercustomize.py that is always imported, and imported fairly early on, then have the container overwrite logging settings before it *does* anything (e.g., sends a request) then you should be okay?

Indeed; container-setup.py or whatever.

Rich configurations are problematic in their own ways.  While the str-key/str-value of os.environ is somewhat limited, I wouldn't want anything richer than JSON (list, dict, str, numbers, bools).

JSON is a subset of YAML. I honestly believe YAML meets the requirements for richness, simplicity, flexibility, and portability that a configuration format really needs.

And then we have to figure out a place to drop the configuration.  Because we are configuring the *process*, not a particular application or request handler, a callable isn't great (unless we expect the callable to drop the config somewhere and other things to pick it up?)

I've already mentioned an environment variable identifying the path to the on-disk configuration file—APP_CONFIG_PATH—which would then be read in and acted upon by the container-setup.py file which is initially imported before the rest of the application. Also, the application factory idea of passing the already read-in configuration dictionary is quite handy, here.

I found at least giving one valid hostname (and yes, should include a path) was important for many applications.  E.g., a bunch of apps have tendencies to put hostnames in the database.

Luckily, that's a bad habit we can discourage.  ;)

I'm not psyched about pointing to a file, though I guess it could work -- it's another kind of peculiar drop-the-config-somewhere-and-wait-for-someone-to-pick-it-up.  At least dropping it directly in os.environ is easy to use directly (many things allow os.environ interpolation already) and doesn't require any temporary files.  Maybe there's a middle ground.

Picked up by the container-setup.py site-customize script. What's the limit on the size of a variable in the environ? (Also, that memory gets permanently allocated for the life of the application; not very efficient if we're just going to convert it to a rich internal structure.)

:: Application (package) name.

This doesn't seem meaningful to me -- there's no need for a one-to-one mapping between these applications and a particular package.  Unless you mean some attempt at a unique name that can be used for indexing?

You're mixing something up, here. Each application is a single primary package with dependencies. One container per application.

It would also need a way to specify things like what port to run on

Automatically allocated by the app server.

public or private interface

Chosen by the deployer during deployment time configuration.

maybe indicate if something like what proxying is valid (if any)

If it's WSGI, it's irrelevant. If it's a network service, it shouldn't be HTTP.

maybe process management parameters

For WSGI apps, it's transparent. Each app server would have its own preference (e.g. mine will prefer FastCGI on-disk sockets) and the application will be blissfully unaware of that.

ways to inspect the process itself (since *maybe* you can't send internal HTTP requests into it), etc.

Interesting idea, not sure how that would be implemented or used, though.

PHP! ;)

PHP can be deployed as a WSGI application.  :P

I'm not personally that happy with how App Engine does it, as an example -- it requires a regex-based dispatch.

Regex dispatch is terrible. (I've actually encountered Python's 56KiB regular expression size limit on one project!) Simply exporting folders as "top level" webroots would be sufficient, methinks.

Anything "string-like" or otherwise fancy requires more support libraries for the application to actually be able to make use of the environment.  Maybe necessary, but it should be done with great reluctance IMHO.

I've had great success with string-likes in WebCore/Marrow and TurboMail for things like e-mail address lists, e-mail addresses, and URLs.

        — Alice.


_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to