On Tue, 20 May 2008 18:10:31 -0700 (PDT) Marcus Cavanaugh <[EMAIL PROTECTED]> wrote:
> I've read the ShardedSession docs a few times. The shard_chooser() > callable is straightforward, and thanks to the example [1], I think I > understand how to use query_chooser(); but I need a pointer about how > to use id_chooser() properly. The docs describe id_chooser as this: "a > function which can return a list of shard ids which apply to a > particular instance identifier", but that seems unclear to me. > > Say I issue the following queries: > > 1) Session.query(User).filter_by(id=4) > 2) Session.query(UnrelatedTable).filter_by(id=567) > 3) Session.query(UserProfiles).filter_by(user_id=4) # where > UserProfiles is mapped to a User class > > In the first query, the id_chooser would simply return the proper > shard, since I know that I'm querying for a User. The second query > searches for a completely unrelated (non-sharded) table; how would I > inspect the id_chooser()'s "query" parameter to determine that it's > searching for a user? Lastly, in the third query above, which only > indirectly wants to find a User class, which shard callable would be > invoked and how do I make sure it finds the right shard for the > associated User? > Your id_chooser is only consulted when the ID's of one or more mapped objects that need to be retrieved are explicitly known. More specifically, in the current implementation, it is only ever called from ShardedQuery's get() and load() methods, which always accept IDs. This is distinct from *filtering* on the ID columns in a query; whether the expressions used in a filter constitute an "ID" is not even considered in that code path, or at least not for the purposes of sharding. So, AFAICT, none of the above examples actually ever consult the id_chooser, just query_chooser. As for what id_chooser ought to do? If the query is for something that is sharded, return a sequence of shards that should be tried to retrieve the object with that ID. As far as I can tell, the SQL that will be issued in such a case is going to have "WHERE some_primary_keys=some_constant" (the ID), and therefore will use indexing and go fast, so most implementations of id_chooser I've seen or written just return a list of all shards unconditionally. You can probably start off with such an implementation and give it more smarts later if you find it's causing performance problems to query all shards. > A related question: Should I use a separate "Session" to separate the > sharded tables from the non-sharded ones, in the case where I would > not need to aggregate queries between the two sessions? There's a few options here. You can do that, or you can check the query in your *_chooser functions and decide that the query at hand isn't going to span shards, and just return a single shard. Another option is to use ShardedQuery.set_shard(), which returns a new query limited to a single shard. Aggregation and shard-selection smarts (including *_chooser functions) get bypassed when you do this. Hope this helps, Kyle --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---