I just wanted to toss out one more use case that I know I would find especially useful. Legacy databases.
I'd like to see the ability to setup a model (using rough introspection and then user tweaking) based on a legacy database, maybe with the options of read-write or read-only access. Like Robert, I'll have to give this another read through tomorrow after a little sleep. -Sean On Sep 10, 5:27 pm, Robert Lofthouse <[EMAIL PROTECTED]> wrote: > Wow, i'm with Malcolm on the "steady on, let's take this a bit at a > time" :) Someone has been working hard, but that's a huge amount to > think about. > > Reading through some of the comments: Most of the functionality > described here would be an explicit mapping and choice made by the > developer, it would be pretty hard to use any form of magic. I'm all > for the setting up of bindings in a settings.py, which you later > access and pass through whenever you do a select/update/delete. I'm > also for per-model connections/connection overrides, with an option to > override on a more general level at the (overseeing) application > level. > > I'm against the idea of having functions in settings.py, but I don't > see why they can't be discovered based on definitions in the settings > file. > > Will have a proper read through tomorrow, as this thread is > interesting. > > On Sep 10, 6:53 pm, Simon Willison <[EMAIL PROTECTED]> wrote: > > > For those who weren't at DjangoCon, here's the state of play with > > regards to multi-db support: Django actually supports multiple > > database connections right now: the Query class (in django/db/models/ > > sql/query.py) accepts a connection argument to its constructor, and > > the QuerySet class (django/db/models/query.py) can be passed an > > optional Query instance - if you don't pass it one, it will create a > > Query that uses the default django.db.connection. > > > As a result, if you want to talk to a different connection you can do > > it right now using a custom manager: > > > class MyManager(Manager): > > > def get_query_set(self): > > query = sql.Query(self.model, my_custom_db_connection) > > return QuerySet(self.model, query) > > > As Malcolm described it, he's provided the plumbing, now we need to > > provide the porcelain in the form of a powerful, user-friendly API to > > this stuff. Here's my first attempt at an API design. > > > Requirements > > ============ > > > There are a number of important use-cases for multi-db support: > > > * Simple master-slave replication: SELECT queries are distributed > > between slaves, while UPDATE and DELETE statements are sent to > > the master. > > * Different Django applications live on different databases, e.g. > > a forum on one database while blog lives on another. > > * Moving data between different databases - it would be useful if > > you could do this using the ORM to help paper over the > > differences in SQL syntax. > > * Sharding: data in a single Django model is distributed across > > multiple databases depending on some kind of heuristic (e.g. by > > user ID, or with archived content moved to a different server) > > * Replication tricks, for example if you use MySQL's InnoDB for > > your data but replicate to a MyISAM server somewhere so you can > > use MySQL full-text indexing to search (Flickr used to do this). > > > I've probably missed some; please feel free to fill in the gaps. > > > We don't need to solve every problem, but we do need to provide > > obvious hooks for how those problems should be solved. Sharding, for > > example, is extremely application specific. I don't think Django > > should automatically shard your data for you if you specify 'sharding > > = True' on a model class, but it should provide documented hooks for > > making a custom decision on which database connection should be used > > for a query that allow sharding to be implemented without too much > > pain. > > > Different applications on different databases on the other hand is > > something Django should support out of the box. Likewise, master-slave > > replication is common enough that it would be good to solve it with as > > few lines of user-written code as possible (it's the first step most > > people take to scale their database after adding caching - and it's a > > sweet spot for the kind of read-heavy content sites that Django is > > particularly suited for). > > > Proposed API > > ============ > > > Here's my first attempt at describing a user-facing API. > > > First, we need a way of specifying multiple database connections. > > Adrian has already expressed an interest in moving to DSNs rather than > > individual settings, so I suggest something like this: > > > DATABASES = { > > 'default': 'mysql://foo:[EMAIL PROTECTED]/baz', > > > } > > > With multiple databases configured this could be: > > > DATABASES = { > > 'master': 'mysql://foo:[EMAIL PROTECTED]/mydb', > > 'slave1': 'mysql://foo:[EMAIL PROTECTED]/mydb', > > 'slave2': 'mysql://foo:[EMAIL PROTECTED]/mydb', > > 'archive': 'mysql://foo:[EMAIL PROTECTED]/mydb', > > 'default': 'master', > > > } > > > There are two types of connection string - DSNs and aliases. A DSN > > contains '://' while an alias does not. Aliases can be used even > > within the DATABASES setting itself, as with 'default' in the above > > example. > > > It should be possible to use a DSN that has not been defined in the > > DATABASES setting. As a result, I propose that anywhere in Django that > > accepts a connection alias should also accept a DSN or even a raw DB- > > API compliant connection object. > > > The QuerySet.using() method > > --------------------------- > > > Next, we need a way of telling Django which connection to use. I > > propose a new queryset method as the lowest level way of doing this, > > called 'using': > > > qs = Article.objects.filter(published__lt = ...).using('archive') > > > "using(alias_or_connection_or_dsn)" simply tells the QuerySet to > > execute against a different connection, by updating its > > internal .connection attribute. > > > Other options for this method name include: > > > with_db() > > with_connection() > > > I preferred "using()" as it reads nicely and doesn't contain an > > underscore. > > > using() represents the lowest level user-facing API. We can cover a > > common case (different applications on different databases) with the > > following: > > > class Article(models.Model): > > ... > > class Meta: > > using = 'articles' > > > This means "use the articles connection for all queries originating > > with this model". I'm repurposing the term 'using' here for API > > consistency. > > > Advanced connection selection > > ----------------------------- > > > All of the other above use-cases boil down to one key decision: given > > a particular database query, which database connection should I > > execute the query against? > > > I propose adding a manager method which is called every time that > > decision is made, and which is designed to be over-ridden by advanced > > users. Here's the default implementation: > > > class Manager: > > ... > > def get_connection(self, query): > > from django.db import connection > > return connection # Use the default connection for everything > > > Here's an implementation which implements very simple master-slave > > replication: > > > class Manager: > > ... > > def get_connection(self, query): > > if isinstance(query, (InsertQuery, DeleteQuery, UpdateQuery)): > > return 'master' > > else: > > return 'slave' > > # Or if we have more than one slave: > > return random.choice(['slave1', 'slave2']) # Footnote [1] > > > The above would be even easier if InsertQuery, DeleteQuery and > > UpdateQuery were all subclasses of a ModificationQuery class (they are > > currently all direct subclasses of Query) - then the check could > > simply be: > > > if isinstance(query, ModificationQuery) > > > We could even ship a MasterSlaveManager that implements a variant of > > the above logic in django.contrib.masterslave (more for educational > > and marketing purposes than because it's something that's hard to > > implement). > > > Note that in my above example get_connection() methods one returns an > > actual connection object while the other returns a connection alias. > > This makes for a more convenient API, and is consistent with my above > > suggestion that DSNs, aliases and connection objects should be > > interchangeable. > > > Since the get_connection method has access to the full query object, > > even complex sharding schemes based on criteria such as the individual > > fields being looked up in the query could be supported reasonably > > well. > > > Dealing with single queries that span multiple databases > > -------------------------------------------------------- > > > Once you have different tables living in different databases there's > > always the chance that someone will try to write a query that attempts > > to join tables that live on two different database servers. I don't > > think we should address this problem at all (aside from maybe > > attempting to throw a descriptive error message should it happen) - if > > you're scaling across different servers you need to be aware of the > > limitations of that approach. > > > That said, databases like MySQL actually do allow cross-database joins > > provided both databases live on the same physical server. Is this > > something we should support? I'd like to say "no" and assume that > > people who need to do that will be happy rolling their own SQL using a > > raw cursor, but maybe I'm wrong and it's actually a common use case. > > > Connection pooling > > ------------------ > > > This is where I get completely out of my depth, but it seems like we > > might need to implement connection pooling at some point since we are > > now maintaining multiple connections to multiple databases. We could > > roll our own solution here, but to my knowledge SQLAlchemy has a solid > > connection pool implementation which is entirely separate from the > > rest of the SQLAlchemy ORM. We could just ensure that if someone needs > > connection pooling there's a documented way of integrating the > > SQLAlchemy connection pool with Django - that way we don't have an > > external dependency on SQL Alchemy for the common case but people who > > need connection pools can still have them. > > > Backwards compatibility > > ----------------------- > > > I think we can do all of > > ... > > read more » --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---