[sqlalchemy] Re: Need help with the shard session and query get function

Jay Decker Fri, 04 Jul 2008 16:34:19 -0700

> its nonsensical to call upon the ShardedSession *within* the
> query_chooser def.  The ShardedSession can't do a query without a
> return value from the chooser, so that it knows which engine to
> query.


I forgot to mention my lookup table is also sharded(in other words, lookup
table is split into buckets).  Before I insert data into Post table (done
via post_data.py), I determine which lookup table bucket to go and retrieve
the shard name where the user belongs to.  That is the reason
I have done this inside def shard_chooser_post of module post_config.py :

#sesslk is a ShardSession for lookup table which is split into buckets.
sesslk = create_session_lookup()
def shard_chooser_post(mapper, instance, clause=None):

    querylk = sesslk.query(Lookup)
    lk_rec = querylk.get([instance.username]) #determines which bucket of
lookup table by using hasd modulo on instance.username and get the
username,shard mapping.

Above, I am using sesslk(lookup table shard session) to hit one of the
lookup table bucket using instance.username and to get the record with
usrname, shardname mapping.  For lookup table, username and bucket mapping
is fixed by using modulo.  But the other tables(user, post, comment) are
sharded dynamically by {username, shardname} mapping.  Having lookup table
allows dynamically reshard later and particular users can be moved between
shards.  I know hibernate does this by virtual shards( but hibernate's size
of virtual shards are fixed up front).

>I'm fairly confused about how your scheme is to work here,
> but I'd assume that one of your shards happens to contain some
> information to be used, so call upon the desired engine directly
> within this function, i.e. engine.execute("select my_shard_id from
> my_shard_table where foo='bar'").fetchall() .   I don't quite get how
> sharding is going to help you here in the first place;  you aren't
> getting any performance/clustering advantages (since you're relying on
> a big slow query every time to one monolithic database) and you aren't
> saving on rows, either (since you have one monolithic table with a row
> for every piece of data in all the other databases).

I am not sure what you mean here.  Only monolithic database maybe you are
referring to is lookup table which is not the case.  It has been split into
buckets.  Sorry, I forgot to mention this in earlier post.  Every other
tables are sharded horizontally.  Only reason opting out of foreign keys was
that it become useless when you want to take each table out of the current
shard and place it into its own vertical partition and vice versa.  Foreign
keys don't work across vertical shards.

Here is the complete code what I am trying to do.  I know this is bit long.
But it maybe helpful for those who come later searching for sharding.

One more question I have is regarding ShardSessions.  When you have user,
post, comment table is sharded based on username which is dynamically looked
up in lookup table, does each table require separate ShardSession.
Static/fixed sharding can be achieved by hard coding the modulo hash
function and have one ShardSession for all tables that are sharded by one
common field such as username.  After reading up material on sharding, many
tend to recommend dynamic sharding.
========================================
blog_engine.py (Setting up the database engine):

========================================
setup.py (create shards and bucket tables: lookup, post):

Run this first to create the buckets and shard on db.  Then to populate the
data for each tables run lookup_data.py and post_data.py
========================================
lookup_config.py (bucketing configuration for lookup table):


========================================
post_config.py (sharding configuration for post table):

========================================
lookup.py (Lookup table model definition):

========================================
post.py (Post table model definition):

========================================
lookup_data.py (Load sample data into bucket of lookup tables:

========================================
post_data.py (Load sample data into shards of post tables.  Before inserting
records into Post table, username key is used determines the bucket where
the {username, shardname} mapping is stored, and then go to that bucket and
get the {username,shardname} record.  Insert Post record into the shard with
shardname):

========================================

My problem mainly lies in query_chooser and id_chooser funcitons at
post_config.py and lookup_config.py.

Thank you, hope this outlines what I am trying to do.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

blog_engine.py
Description: Binary data

lookup.py
Description: Binary data

lookup_config.py
Description: Binary data

lookup_data.py
Description: Binary data

post.py
Description: Binary data

post_config.py
Description: Binary data

post_data.py
Description: Binary data

setup.py
Description: Binary data

[sqlalchemy] Re: Need help with the shard session and query get function

Reply via email to