I forgot to attach my_metadata.py, here is the content:
from sqlalchemy import MetaData a_metadata = MetaData() b_metadata = MetaData() On Fri, Jul 4, 2008 at 7:34 PM, Jay Decker <[EMAIL PROTECTED]> wrote: > > its nonsensical to call upon the ShardedSession *within* the > > query_chooser def. The ShardedSession can't do a query without a > > return value from the chooser, so that it knows which engine to > > query. > > I forgot to mention my lookup table is also sharded(in other words, lookup > table is split into buckets). Before I insert data into Post table (done > via post_data.py), I determine which lookup table bucket to go and retrieve > the shard name where the user belongs to. That is the reason > I have done this inside def shard_chooser_post of module post_config.py : > > #sesslk is a ShardSession for lookup table which is split into buckets. > sesslk = create_session_lookup() > def shard_chooser_post(mapper, instance, clause=None): > > querylk = sesslk.query(Lookup) > lk_rec = querylk.get([instance.username]) #determines which bucket of > lookup table by using hasd modulo on instance.username and get the > username,shard mapping. > > Above, I am using sesslk(lookup table shard session) to hit one of the > lookup table bucket using instance.username and to get the record with > usrname, shardname mapping. For lookup table, username and bucket mapping > is fixed by using modulo. But the other tables(user, post, comment) are > sharded dynamically by {username, shardname} mapping. Having lookup table > allows dynamically reshard later and particular users can be moved between > shards. I know hibernate does this by virtual shards( but hibernate's size > of virtual shards are fixed up front). > > >I'm fairly confused about how your scheme is to work here, > > but I'd assume that one of your shards happens to contain some > > information to be used, so call upon the desired engine directly > > within this function, i.e. engine.execute("select my_shard_id from > > my_shard_table where foo='bar'").fetchall() . I don't quite get how > > sharding is going to help you here in the first place; you aren't > > getting any performance/clustering advantages (since you're relying on > > a big slow query every time to one monolithic database) and you aren't > > saving on rows, either (since you have one monolithic table with a row > > for every piece of data in all the other databases). > > I am not sure what you mean here. Only monolithic database maybe you are > referring to is lookup table which is not the case. It has been split into > buckets. Sorry, I forgot to mention this in earlier post. Every other > tables are sharded horizontally. Only reason opting out of foreign keys was > that it become useless when you want to take each table out of the current > shard and place it into its own vertical partition and vice versa. Foreign > keys don't work across vertical shards. > > Here is the complete code what I am trying to do. I know this is bit > long. But it maybe helpful for those who come later searching for sharding. > > One more question I have is regarding ShardSessions. When you have user, > post, comment table is sharded based on username which is dynamically looked > up in lookup table, does each table require separate ShardSession. > Static/fixed sharding can be achieved by hard coding the modulo hash > function and have one ShardSession for all tables that are sharded by one > common field such as username. After reading up material on sharding, many > tend to recommend dynamic sharding. > ======================================== > blog_engine.py (Setting up the database engine): > > ======================================== > setup.py (create shards and bucket tables: lookup, post): > > Run this first to create the buckets and shard on db. Then to populate the > data for each tables run lookup_data.py and post_data.py > ======================================== > lookup_config.py (bucketing configuration for lookup table): > > > ======================================== > post_config.py (sharding configuration for post table): > > ======================================== > lookup.py (Lookup table model definition): > > ======================================== > post.py (Post table model definition): > > ======================================== > lookup_data.py (Load sample data into bucket of lookup tables: > > ======================================== > post_data.py (Load sample data into shards of post tables. Before > inserting records into Post table, username key is used determines the > bucket where the {username, shardname} mapping is stored, and then go to > that bucket and get the {username,shardname} record. Insert Post record > into the shard with shardname): > > ======================================== > > My problem mainly lies in query_chooser and id_chooser funcitons at > post_config.py and lookup_config.py. > > Thank you, hope this outlines what I am trying to do. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---