[sqlalchemy] Re: Need help with the shard session and query get function

Jay Decker Fri, 04 Jul 2008 16:45:15 -0700

I forgot to attach my_metadata.py, here is the content:


from sqlalchemy import MetaData

a_metadata = MetaData()
b_metadata = MetaData()



On Fri, Jul 4, 2008 at 7:34 PM, Jay Decker <[EMAIL PROTECTED]> wrote:

> > its nonsensical to call upon the ShardedSession *within* the
> > query_chooser def.  The ShardedSession can't do a query without a
> > return value from the chooser, so that it knows which engine to
> > query.
>
> I forgot to mention my lookup table is also sharded(in other words, lookup
> table is split into buckets).  Before I insert data into Post table (done
> via post_data.py), I determine which lookup table bucket to go and retrieve
> the shard name where the user belongs to.  That is the reason
> I have done this inside def shard_chooser_post of module post_config.py :
>
> #sesslk is a ShardSession for lookup table which is split into buckets.
> sesslk = create_session_lookup()
> def shard_chooser_post(mapper, instance, clause=None):
>
>     querylk = sesslk.query(Lookup)
>     lk_rec = querylk.get([instance.username]) #determines which bucket of
> lookup table by using hasd modulo on instance.username and get the
> username,shard mapping.
>
> Above, I am using sesslk(lookup table shard session) to hit one of the
> lookup table bucket using instance.username and to get the record with
> usrname, shardname mapping.  For lookup table, username and bucket mapping
> is fixed by using modulo.  But the other tables(user, post, comment) are
> sharded dynamically by {username, shardname} mapping.  Having lookup table
> allows dynamically reshard later and particular users can be moved between
> shards.  I know hibernate does this by virtual shards( but hibernate's size
> of virtual shards are fixed up front).
>
> >I'm fairly confused about how your scheme is to work here,
> > but I'd assume that one of your shards happens to contain some
> > information to be used, so call upon the desired engine directly
> > within this function, i.e. engine.execute("select my_shard_id from
> > my_shard_table where foo='bar'").fetchall() .   I don't quite get how
> > sharding is going to help you here in the first place;  you aren't
> > getting any performance/clustering advantages (since you're relying on
> > a big slow query every time to one monolithic database) and you aren't
> > saving on rows, either (since you have one monolithic table with a row
> > for every piece of data in all the other databases).
>
> I am not sure what you mean here.  Only monolithic database maybe you are
> referring to is lookup table which is not the case.  It has been split into
> buckets.  Sorry, I forgot to mention this in earlier post.  Every other
> tables are sharded horizontally.  Only reason opting out of foreign keys was
> that it become useless when you want to take each table out of the current
> shard and place it into its own vertical partition and vice versa.  Foreign
> keys don't work across vertical shards.
>
> Here is the complete code what I am trying to do.  I know this is bit
> long.  But it maybe helpful for those who come later searching for sharding.
>
> One more question I have is regarding ShardSessions.  When you have user,
> post, comment table is sharded based on username which is dynamically looked
> up in lookup table, does each table require separate ShardSession.
> Static/fixed sharding can be achieved by hard coding the modulo hash
> function and have one ShardSession for all tables that are sharded by one
> common field such as username.  After reading up material on sharding, many
> tend to recommend dynamic sharding.
> ========================================
> blog_engine.py (Setting up the database engine):
>
> ========================================
> setup.py (create shards and bucket tables: lookup, post):
>
> Run this first to create the buckets and shard on db.  Then to populate the
> data for each tables run lookup_data.py and post_data.py
> ========================================
> lookup_config.py (bucketing configuration for lookup table):
>
>
> ========================================
> post_config.py (sharding configuration for post table):
>
> ========================================
> lookup.py (Lookup table model definition):
>
> ========================================
> post.py (Post table model definition):
>
> ========================================
> lookup_data.py (Load sample data into bucket of lookup tables:
>
> ========================================
> post_data.py (Load sample data into shards of post tables.  Before
> inserting records into Post table, username key is used determines the
> bucket where the {username, shardname} mapping is stored, and then go to
> that bucket and get the {username,shardname} record.  Insert Post record
> into the shard with shardname):
>
> ========================================
>
> My problem mainly lies in query_chooser and id_chooser funcitons at
> post_config.py and lookup_config.py.
>
> Thank you, hope this outlines what I am trying to do.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: Need help with the shard session and query get function

Reply via email to