Sorry for the messed up word wrap. I find this Google Groups web editor hard to use.
On Wednesday, August 22, 2018 at 6:03:07 PM UTC+8, Carson Ip wrote: > > This is my first post here. > > Software / Library versions: (tho unrelated) > sqlalchemy version: 1.0.19 > db: MySQL 5.6 > db driver: mysqlclient 1.3.7 > > Background: > I have 6 databases with a total of hundreds of shards. I realize when I > add more database hosts (and shards), my ShardedSession which queries all > the shards becomes slower. I looked into ShardedQuery and understand that > things run in serial. I am trying to add some gevent magic to parallelize > the queries. Please excuse my application design which did not allow me to > get the shard id by looking at the query target. > > My proposal: > > from gevent.pool import Group > class GeventShardedQuery(ShardedQuery): > def _execute_and_instances(self, context): > def iter_for_shard(shard_id): > context.attributes['shard_id'] = shard_id > result = self._connection_from_session( > mapper=self._mapper_zero(), > shard_id=shard_id).execute( > context.statement, > self._params) > return self.instances(result, context) > > if self._shard_id is not None: > return iter_for_shard(self._shard_id) > else: > g = Group() > values = [g.spawn(iter_for_shard, shard_id) for shard_id in > self.query_chooser(self)] > g.join() > partial = [] > for v in values: > partial.extend(v.value) > > # if some kind of in memory 'sorting' > # were done, this is where it would happen > return iter(partial) > > > Observation: > In my benchmark, it will significantly (20-30% gain) speed up some > queries. I also see some cross-shard queries with no gain (that one is with > eager-loading). > > Here comes my questions: > 1. Is it a good idea to actually parallelizing the queries? Will it mess > with some of the internals (e.g. sqlalchemy query context, db driver)? > Although I'm quite sure mysqlclient is ok with this. > 2. > context.attributes['shard_id'] = shard_id > assigning context.attributes['shard_id'] like this no longer works because > it will be overwritten by another greenlet because of the parallelization. > I suppose it is not a big deal to me because I don't need it anyway. But is > there an easy way to workaround this? > 3. Is there any other problems with my code? Now I will create a > ShardedSession with the query_cls set as GeventShardedQuery. > > Thanks in advance! > > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.