On Mar 30, 4:42 pm, "Michael Bayer" <mike...@zzzcomputing.com> wrote:
> George V. Reilly wrote:
> > We're using SQLAlchemy sharding to partition accounts across a couple
> > of databases. We want to add more partitions, but first we need to
> > eliminate some unnecessary cross-partition queries.
> >
> > This works well most of the time, but we're finding that some queries
> > do not have a "value". These are all of the form
> >
> >     SELECT shopping_list_items.version AS shopping_list_items_version
> >     FROM shopping_list_items
> >     WHERE shopping_list_items.account_id = :param_1
> >         AND shopping_list_items.shopping_list_item_id = :param_2
> >
> > and :param1 is of the form _BindParamClause(u'%(63636624 param)s',
> > None, type_=UUID())
> >
> > Typically, I'm seeing this come out of the innards of SQLAlchemy,
> > as one of several queries triggered by, say, a session.merge().
>
> The only Query() I can see getting generated that would have non-valued
> bindparams would be during a _get().  The values should be present in
> query._params.   If you need more info I can dig in to recall how the keys
> of that dictionary are formatted in this case.

Thanks, Michael. Here's what I came up with:

    class FindShardableId(sqlalchemy.sql.ClauseVisitor):
        def __init__(self, ids, key_fields, get_shard, params):
            self.ids = ids
            self.key_fields = key_fields
            self.get_shard = get_shard
            self.params = params

        def visit_binary(self, binary):
            if not self._check_side(binary, binary.left,
binary.right):
                # Lazy load properties tend to be reversed, with the
constant on the left
                self._check_side(binary, binary.right, binary.left)

        def _check_side(self, binary, side, other_side):
            if isinstance(side, sqlalchemy.Column) and side.name in
self.key_fields:
                if binary.operator == sqlalchemy.sql.operators.eq:
                    value = getattr(other_side, "value", None)
                    if (value is None and
                            isinstance(other_side,
sqlalchemy.sql.expression._BindParamClause)):
                        value = self.params.get(other_side.key)
                    if value is not None:
                        self.ids.append(self.get_shard(value))
                        return True
                elif binary.operator ==
sqlalchemy.sql.operators.in_op:
                    for bind in other_side.clauses:
                        self.ids.append(self.get_shard(bind.value))
                    return True

    class QuerySharder(object):
        def sessionmaker(self, **sessionmaker_args):
            Session = sqlalchemy.orm.sessionmaker(
                class_ = sqlalchemy.orm.shard.ShardedSession,
**sessionmaker_args)
            Session.configure(
                shards=self.shards,
                shard_chooser=self._shard_chooser,
                id_chooser=self._id_chooser,
                query_chooser=self._query_chooser)
            return Session

        def _query_chooser(self, query):
            ids = []
            if query._criterion is not None:
                FindShardableId(
                        ids,
                        set(["account_id", "account_guid"]),
                        lambda account_id:
self.shard_manager.shard_id_from_guid(account_id),
                        query._params
                    ).traverse(query._criterion)
            if len(ids) == 0:
                logging.warn("\n\n! Executing query against all
shards; "
                             "this may not be optimal:\n\t{0}\n
\tParams: {1}\n".format(
                                 str(query), str(query._params)))
                return self.shards.keys()
            else:
                return ids

I really don't like the
    isinstance(other_side,
sqlalchemy.sql.expression._BindParamClause))
in the middle of _check_side. Is there a cleaner way to do this?

I found that a combination of
* the above _check_side and two-sided visit_binary
* doing a better job of declaring ForeignKey relationships in Columns
* some explicit primaryjoins in calls to relation()
cleaned up all the cases where SA wasn't providing the ids in queries

Perhaps the sharding sample in SA 0.6 could be expanded?
--
/George V. Reilly, Seattle

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to