Re: [sqlalchemy] Multiple databases or multiple schemas?

Michael Bayer Mon, 14 Jun 2010 10:06:17 -0700

On Jun 13, 2010, at 4:27 PM, Henry Precheur wrote:

> I'm starting a new project which is roughly hosting a bunch of sites. I
> want the sites to be isolated, they'll all have the same schema (data
> definition), but will store their data on different 'name-spaces'.
> 
> The front end will be a python web app. And I'd like to have this
> application talk to all the 'name-spaces' at the same time:
> 
>  With a request on example.com/site1, the web app will use the
>  'name-space' site1, with example.com/site2 it will use site2.
> 
> I am using Postgres. There are 2 options for the 'name-space': Database
> or Schema [1]:
>  1. One database per site
>  2. One database for all the sites and 1 schema per site
> 
> 
> Solution #1 would require to maintain 1 connection per site & per python
> process. That means: lots of connections, & lots of memory needed. One
> the other hand, this solution is supported by SQLAlchemy out-of-the-box.
> I'll have a dictionary like that:
> 
>    {'site1': Engine('postgres://.../site1',
>     'site2': Engine('postgres://.../site2', ...}
> 
> And whenever a request comes in I get the right engine via this
> dictionary.
> 
> 
> Solution #2 is not supported natively by SQLAlchemy. Each time a request
> comes-in I'll have to issue an additional query "SET search_path TO
> MY_SITE" where MY_SITE is the schema associated with the site.
> 
> 
> Solution #2 seems much more lightweight to me. The only problem is the
> small overhead that might be created by the additional query.
> 
> What do you guys think? Will I get into trouble with solution #2?
> 
> If you have alternative suggestions I'd like to hear them :)


you can do the multi-schema thing using either a PoolListener, or a Table 
subclass that implements a @property for the "schema" attribute - in both cases 
these would be linked to a thread local variable that you would set per 
request.   If the load per site is low, you can also forego the usage of 
pooling to eliminate the issue of dormant connections (possibly look into 
something like pgpool, perhaps they have some feature that optimizes this use 
case).

As far as approach, consider the load each per-user site will have, and how you 
might scale out to include multiple database servers.     I'd grep around the 
PG docs and wiki carefully to get a full grasp of the implications of each 
approach.


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Multiple databases or multiple schemas?

Reply via email to