Re: [Web-SIG] A Python Web Application Package and Format

Alice Bevan–McGregor Wed, 13 Apr 2011 23:57:53 -0700

On 2011-04-13 18:16:36 -0700, Ian Bicking said:

While initially reluctant to use zip files, after further discussionand thought they seem fine to me, so long as any tool that takes a zipfile can also take a directory. The reverse might not be true -- forinstance, I'd like a way to install or update a library for (andinside) an application, but I doubt I would make pip rewrite zip filesto do this ;) But it could certainly work on directories. Supportingboth isn't a big deal except that you can't do symlinks in a zip file.

I'm not talking about using zip files as per eggs, where the code ismaintained within the zip file during execution. It is merely apackaging format with the software itself extracted from the zip duringinstallation / upgrade. A transitory container format. (Folders inthe end.)

Symlinks are an OS-specific feature, so those are out as a corerequirement. ;)

I don't think we're talking about something like a buildout recipe. Well, Eric kind of brought something like that up... but otherwise Ithink the consensus is in that direction.


Ambiguous statements FTW, but I think I know what you meant.  ;)

So specifically if you need something like lxml the applicationspecifies that somehow, but doesn't specify *how* that library isacquired. There is some disagreement on whether this is generallytrue, or only true for libraries that are not portable.

+1

I think something along the lines of autoconf (those lovely ./configurescripts you run when building GNU-style software from source) withpublished base 'checkers' (predicates as I referred to them previously)would be great. A clear way for an application to declare adependency, have the application server check those dependencies, thennotify the administrator installing the package.

I've seen several Python libraries that include the C library code thatthey expose; while not so terribly efficient (i.e. you can't installthe C library once, then share it amongst venvs), it is effective forsmall packages.

Larger (i.e. global or application-local) would require theintervention of a systems administrator.

Something like a database takes this a bit further. We haven't reallydiscussed it, but I think this is where it gets interesting. SilverLining has one model for this. The general rule in Silver Lining isthat you can't have anything with persistence without asking for it asa service, including an area to write files (except temporary files?)


+1

Databases are slightly more difficult; an application could ask for:

:: (Very Generic) A PEP-249 database connection.

:: (Generic) A relational database connection string.

:: (Specific) A connection string to a specific vendor of database.

:: (Odd) A NoSQL database connection string.

I've been making heavy use of MongoDB over the last year and a half,but AFIK each NoSQL database engine does its own thing API-wise. (Thenthere are ORMs on top of that, but passing a connection string likemysql://user:pass@host/db or mongo://host/db is pretty universal.)

It is my intention to write an application server that is capable ofcreating and securing databases on-the-fly. This would require fairlyhigh-level privileges in the database engine, but would result in farmore "plug-and-play" configuration. Obviously when deleting anapplication you will have the opportunity to delete the database andassociated user.

I assume everyone agrees that an application can't write to its ownfiles (but of course it could execfile something in another location).

+1; that _almost_ goes without saying. :) At the same time, anapplication server /must not/ require root access to do its work, thusno mandating of (real) chroots, on-the-fly user creation, etc.

There are ways around almost all security policies, but where possiblesetting the read-only flag (Windows) or removing write (chmod -w onPOSIX systems) should be enough to prevent casual abuse.

I suspect there's some disagreement about how the Python environmentgets setup, specifically sys.path and any other application-specificcustomizations (e.g., I've set environ['DJANGO_SETTINGS_MODULE'] insilvercustomize.py, and find it helpful).

Similar to Paste's "here" variable for INI files, having some method ofthe application defining environment variables with base pathreferences would be needed.

I've tossed out my idea of sharing dependencies, BTW, so a simpleextraction of the zipped application into one package folder (linked inusing a .pth file) with the dependencies installed into an app-packagesfolder in the path (like site-packages) would be ideal. At least, forme. ;)

Describing the scope of this, it seems kind of boring. In, forexample, App Engine you do all your setup in your runner -- I find thisdeeply annoying because it makes the runner the only entry point, andthus makes testing, scripts, etc. hard.

I agree; that's a short-sighted approach to an application containerformat. There should be some way to advertise a test suite and, forexample, have the suite run before installation or during upgrade.(Rolling back the upgrade process thus far if there is a failure.)

My shiny end goal would be a form of continuous deployment: a git-basedapplication which gets a post-commit notification, pulls the latest,runs the tests, rolls back on failure or fully deploys the update onsuccess.

We would start with just WSGI. Other things could follow, but I don'tsee any reason to worry about that now. Maybe we should just punt onaggregate applications now too. I don't feel like there's anything wewould do that would prevent other kinds of runtime models (besides thestarting point, container-controlled WSGI), and the places to addsupport for new things are obvious enough (e.g., something like SilverLining's platform setting). I would define a server with accompanyingdaemon processes as an "aggregate".

Since in my model the application server does not proxy requests to theinstantiated applications (each running in its own process), I'm notsure I'm interpreting what you mean by an aggregate applicationproperly.

If "my" application server managed Nginx or Apache configurations,dispatch to applications based on base path would be very easy to dowhile still keeping the applications isolated.

An important distinction to make, I believe, is application concernsand deployment concerns. For instance, what you do with logging is adeployment concern. Generating logging messages is of course anapplication concern. In practice these are often conflated, especiallyin the case of bespoke applications where the only person deploying theapplication is the person (or team) developing the application. Itshouldn't be annoying for these users, though. Maybe it makes sensefor people to be able to include tool-specific default settings in anapplication -- things that could be overridden, but especially for thecase when the application is not widely reused it could be useful. (Anexample where Silver Lining gets is all backwards is I created a[production] section in app.ini when the very concept of "production"is not meaningful in that context -- but these kind of named profileswould make sense for actual application deployment tools.)

Having an application define default logging levels for differentscopes would be very useful. The application server could take thosedefaults, and allow an administrator to modify them or defineadditional scopes quite easily.

There's actually a kind of layered way of thinking of this:
1. The first, maybe most important part, is how you get a proper Pythonenvironment. That includes sys.path of course, with all theaccompanying libraries, but it also includes environment description.

Virtualenv-like, with the application itself linked in via a .pth file(a la setup.py develop, allowing inline upgrades via SCM) anddependencies extracted from the zip distributable into an app-packagesfolder a la site-packages.

I don't install global Python modules on any of my servers, so the--no-site-packages option is somewhat unnecessary for me, but havingsomething similar would be useful, too. Unfortunately, that onefeature seems to require a lot of additional work.

In Silver Lining there's two stages -- first, set some environmentalvariables (both general ones like $SILVER_CANONICAL_HOST andservice-specific ones like $CONFIG_MYSQL_DBNAME), then get sys.pathproper, then import silvercustomize by which an environment can do anymore customization it wants (e.g., set $DJANGO_SETTINGS_MODULE)

Environment variables are typeless (raw strings) and thus less thanoptimum for sharing rich configurations.

Host names depend on how the application is mounted, and a singleapplication may be mounted to multiple domains or paths, so utilizingthe front end web server's rewriting capability is probably the bestsolution for that.

What about multiple database connections? Environment variables arealso not so good for repeated values.


A /few/ environment variables are a good idea, though:

:: TMPDIR — when don't you need temporary files?

:: APP_CONFIG_PATH — the path to a YAML file containing the real configuration.

The configuration file would even include a dict-based loggingconfiguration routing all messages to the parent app server for finaldelivery, removing the need for per-app logging files, etc.

2. Define some basic generic metadata.  "app_name" being the most obvious one.


The standard Python setup metadata is pretty good:

:: Application title.
:: Application (package) name.
:: Short description.
:: Long description / documentation.
:: Author information.
:: License.
:: Source information (URL, download URL).
:: Dependencies.

:: Entry point-style hooks. (Post-install, pre/post upgrade,pre-removal, etc.)


Likely others.

3. Define how to get the WSGI app. This is WSGI specific, but (1) is*not* WSGI specific (it's only Python specific, and would apply well toother platforms)


I could imagine there would be multiple "application types":

:: WSGI application. Define a package dot-notation entry point to aWSGI application factory.

:: Networked daemon. This would allow deployment of Twisted services,for example. Define a package dot-notation entry point to the 'main'callable.

Again, there are likely others, but those are the big two. In both ofthese cases the configuration (loaded automatically) could be passed asa dict to the callable.

4. Define some *web specific* metadata, like static files to serve. This isn't necessarily WSGI or even Python specific (not that we shouldbend backwards to be agnostic -- but in practice I think we'd have tobend backwards to make it Python-specific).

Explicitly defining the paths to static files is not just a good idea,it's The Slaw™.

5. Define some lifecycle metadata, like update_fetch. These aregenerally commands to invoke. IMHO these can be ad hoc, but exist inthe scope of (1) and a full "environment". So it's not radicallydifferent than anything else the app does, it's just we declarespecific times these actions happen.

Script name, dot-notation callable, or URL. I see those as the 'bigthree' to support. Using a dot-notation callable has the same benefitas my comments to #3.

The URL would be relative to wherever the application is mounted withina domain, of course.

6. Define services (or "resources" or whatever -- the name "resource"doesn't make as much sense to me, but that's bike shedding). These arethings the app can't provide for itself, but requires (or perhaps onlywants; e.g., an app might be able to use SQLite, but could also usePostgreSQL). While the list of services will increase over time,without a basic list most apps can't run at all. We also need a coreset as a kind of reference implementation of what a fully-specifiedservice *is*.

I touched on this up above; any DBAPI compliant database or variousconfiguration strings. (I'd implement this as a string-like objectwith accessor properties so you can pass it to SQLAlchemy straight, ordissect it to do something custom.)


More below.

7. In Silver Lining I've distinguished active services (like a runningdatabase) from passive resources (like an installed binary library). Idon't see a reason to conflate these, as they are so very different. Maybe this is part of why "resource" strikes me as an odd name forsomething like a database.

You hit the terminology perfectly: active services (such as databases)are just that, services. Installed binary libraries are resources. :)

So... there's kind of some thoughts about process.


Good stuff.

        — Alice.


_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Re: [Web-SIG] A Python Web Application Package and Format

Reply via email to