[sqlalchemy] Re: ShardedSession and id_chooser

Kyle Schaffrick Wed, 21 May 2008 12:11:07 -0700

On Tue, 20 May 2008 18:10:31 -0700 (PDT)
Marcus Cavanaugh <[EMAIL PROTECTED]> wrote:


> I've read the ShardedSession docs a few times. The shard_chooser()
> callable is straightforward, and thanks to the example [1], I think I
> understand how to use query_chooser(); but I need a pointer about how
> to use id_chooser() properly. The docs describe id_chooser as this: "a
> function which can return a list of shard ids which apply to a
> particular instance identifier", but that seems unclear to me.
> 
> Say I issue the following queries:
> 
> 1) Session.query(User).filter_by(id=4)
> 2) Session.query(UnrelatedTable).filter_by(id=567)
> 3) Session.query(UserProfiles).filter_by(user_id=4) # where
> UserProfiles is mapped to a User class
> 
> In the first query, the id_chooser would simply return the proper
> shard, since I know that I'm querying for a User. The second query
> searches for a completely unrelated (non-sharded) table; how would I
> inspect the id_chooser()'s "query" parameter to determine that it's
> searching for a user? Lastly, in the third query above, which only
> indirectly wants to find a User class, which shard callable would be
> invoked and how do I make sure it finds the right shard for the
> associated User?
> 

Your id_chooser is only consulted when the ID's of one or more mapped
objects that need to be retrieved are explicitly known. More
specifically, in the current implementation, it is only ever called from
ShardedQuery's get() and load() methods, which always accept IDs.

This is distinct from *filtering* on the ID columns in a query; whether
the expressions used in a filter constitute an "ID" is not even
considered in that code path, or at least not for the purposes of
sharding.

So, AFAICT, none of the above examples actually ever consult the
id_chooser, just query_chooser.

As for what id_chooser ought to do? If the query is for something that
is sharded, return a sequence of shards that should be tried to
retrieve the object with that ID.

As far as I can tell, the SQL that will be issued in such a case is
going to have "WHERE some_primary_keys=some_constant" (the ID), and
therefore will use indexing and go fast, so most implementations of
id_chooser I've seen or written just return a list of all shards
unconditionally. You can probably start off with such an implementation
and give it more smarts later if you find it's causing performance
problems to query all shards.

> A related question: Should I use a separate "Session" to separate the
> sharded tables from the non-sharded ones, in the case where I would
> not need to aggregate queries between the two sessions?

There's a few options here. You can do that, or you can check the query
in your *_chooser functions and decide that the query at hand isn't
going to span shards, and just return a single shard.

Another option is to use ShardedQuery.set_shard(), which returns a new
query limited to a single shard. Aggregation and shard-selection smarts
(including *_chooser functions) get bypassed when you do this.

Hope this helps,
Kyle

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: ShardedSession and id_chooser

Reply via email to