On Mar 30, 4:42 pm, "Michael Bayer" <mike...@zzzcomputing.com> wrote: > George V. Reilly wrote: > > We're using SQLAlchemy sharding to partition accounts across a couple > > of databases. We want to add more partitions, but first we need to > > eliminate some unnecessary cross-partition queries. > > > > This works well most of the time, but we're finding that some queries > > do not have a "value". These are all of the form > > > > SELECT shopping_list_items.version AS shopping_list_items_version > > FROM shopping_list_items > > WHERE shopping_list_items.account_id = :param_1 > > AND shopping_list_items.shopping_list_item_id = :param_2 > > > > and :param1 is of the form _BindParamClause(u'%(63636624 param)s', > > None, type_=UUID()) > > > > Typically, I'm seeing this come out of the innards of SQLAlchemy, > > as one of several queries triggered by, say, a session.merge(). > > The only Query() I can see getting generated that would have non-valued > bindparams would be during a _get(). The values should be present in > query._params. If you need more info I can dig in to recall how the keys > of that dictionary are formatted in this case.
Thanks, Michael. Here's what I came up with: class FindShardableId(sqlalchemy.sql.ClauseVisitor): def __init__(self, ids, key_fields, get_shard, params): self.ids = ids self.key_fields = key_fields self.get_shard = get_shard self.params = params def visit_binary(self, binary): if not self._check_side(binary, binary.left, binary.right): # Lazy load properties tend to be reversed, with the constant on the left self._check_side(binary, binary.right, binary.left) def _check_side(self, binary, side, other_side): if isinstance(side, sqlalchemy.Column) and side.name in self.key_fields: if binary.operator == sqlalchemy.sql.operators.eq: value = getattr(other_side, "value", None) if (value is None and isinstance(other_side, sqlalchemy.sql.expression._BindParamClause)): value = self.params.get(other_side.key) if value is not None: self.ids.append(self.get_shard(value)) return True elif binary.operator == sqlalchemy.sql.operators.in_op: for bind in other_side.clauses: self.ids.append(self.get_shard(bind.value)) return True class QuerySharder(object): def sessionmaker(self, **sessionmaker_args): Session = sqlalchemy.orm.sessionmaker( class_ = sqlalchemy.orm.shard.ShardedSession, **sessionmaker_args) Session.configure( shards=self.shards, shard_chooser=self._shard_chooser, id_chooser=self._id_chooser, query_chooser=self._query_chooser) return Session def _query_chooser(self, query): ids = [] if query._criterion is not None: FindShardableId( ids, set(["account_id", "account_guid"]), lambda account_id: self.shard_manager.shard_id_from_guid(account_id), query._params ).traverse(query._criterion) if len(ids) == 0: logging.warn("\n\n! Executing query against all shards; " "this may not be optimal:\n\t{0}\n \tParams: {1}\n".format( str(query), str(query._params))) return self.shards.keys() else: return ids I really don't like the isinstance(other_side, sqlalchemy.sql.expression._BindParamClause)) in the middle of _check_side. Is there a cleaner way to do this? I found that a combination of * the above _check_side and two-sided visit_binary * doing a better job of declaring ForeignKey relationships in Columns * some explicit primaryjoins in calls to relation() cleaned up all the cases where SA wasn't providing the ids in queries Perhaps the sharding sample in SA 0.6 could be expanded? -- /George V. Reilly, Seattle -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.