Re: [sqlalchemy] Performance issues with multi-shard (engines) sharing the same bakery / LRUCache

Mike Bayer Tue, 26 Nov 2019 06:54:21 -0800


On Tue, Nov 26, 2019, at 3:01 AM, Carson Ip wrote:
> Hi Mike,
> 
> Thanks for the swift response.
> 
> Then I guess the easiest fix for me is to modify the cache key composition to 
> replace the dialect object part with a dialect summary (e.g. type + version) 
> that is the same between all engines. But it requires a guarantee that all 
> compiled SQL queries must not contain database (schema) name.
> 
> Having a single connection pool that is shared among all shards on a host is 
> cool (i think we have discussed this in the past in another thread), but is 
> it readily available at the moment? Also, if I go with this solution only, it 
> still requires a LRU cache capacity of (number of hosts * number of queries).



hi there -

so yes, I just looked at the source for schema translate, and apparently I did 
the wrong thing here by making this an inline-compiled element, which while it 
is part of the cache key, does not solve the problem of creating N keys. so 
https://github.com/sqlalchemy/sqlalchemy/issues/5004 is also added and in 1.4 I 
will definitely be changing this so that there is only one cache key per SQL 
regardless of schema translate. good thing this got pointed out. 




> 
> Thanks,
> Carson
> 
> 
> 
> On Tuesday, November 26, 2019 at 1:33:24 AM UTC+8, Mike Bayer wrote:
>> 
>> 
>> On Mon, Nov 25, 2019, at 4:40 AM, Carson Ip wrote:
>>> *Background:*
>>> **
>>> I am using a multi-shard MySQL setup (multiple db hosts, each host holding 
>>> many databases, a.k.a. "shards"). My Python application is creating engines 
>>> to many of these shards. For performance reasons, the application utilizes 
>>> bakery and BakedQuery to avoid compiling SQL statements on every ORM call.
>>> 
>>> *Issue:*
>>> **
>>> I have 100+ BakedQuery and 100+ shards. It appears it needs a bakery 
>>> (LRUCache) of size = (N BakedQuery * M shards) to cache all of the queries 
>>> for all shards because the cache key for the compiled SQL (not the 
>>> BakedQuery) contains the dialect object. Please see _execute_clauseelement 
>>> in sqlalchemy.engine.base. 
>>> 
>>> key = (
>>>     dialect,
>>>     elem,
>>>     tuple(sorted(keys)),
>>>     self.schema_for_object.hash_key,
>>>     len(distilled_params) > 1,
>>> )
>>> compiled_sql = self._execution_options[*"compiled_cache"*].get(key)
>>> *if *compiled_sql *is *None:
>>>     compiled_sql = elem.compile(
>>>         dialect=dialect,
>>>         column_keys=keys,
>>>         inline=len(distilled_params) > 1,
>>>         schema_translate_map=self.schema_for_object
>>>         *if not *self.schema_for_object.is_default
>>>         *else *None,
>>>     )
>>>     self._execution_options[*"compiled_cache"*][key] = compiled_sql
>>> 
>>> When the cache capacity is small, it keeps evicting cache entries and 
>>> compiling queries, using a lot of CPU. Also if it takes N*M for cache 
>>> capacity, it is bad for memory.
>>> 
>>> *Question:*
>>> 
>>> 1. Any good suggestions on fixing the performance and memory issue? e.g. by 
>>> sharing the cache key
>>> 2. To share the cache key, do we implement a __eq__ for the Dialect object 
>>> or make all the shards (engines) share the same Dialect object?
>>> 3. Is sharing a Dialect object dangerous? I see default_schema_name in the 
>>> Dialect object.
>>> 3. This not only affects bakedquery, but also compiled_cache of engines 
>>> (non-BakedQuery). Is there a good universal fix for the issue (like Q2, 
>>> sharing dialect)?
>> 
>> Hi there -
>> 
>> yeah I dont think this is a case that is anticipated right now by any of the 
>> cache-related systems since "dialect" is part of the key. dialect is there 
>> because it impacts how the SQL might be emitted based on options and so 
>> forth.
>> 
>> A longer term solution (definitely not for 1.3.x) would be that dialects 
>> produce part of the cache key based on the type of dialect and the server 
>> version info, as well as any options that may affect SQL output. There is a 
>> new cache key mechanism going into 1.4 that will be targeted for mainstream 
>> use by the SQLAlchemy 2.0 series and I've added 
>> https://github.com/sqlalchemy/sqlalchemy/issues/5002 to ensure this aspect 
>> of it is dealt with.
>> 
>> For now, it is mostly safe to share the dialect object, with the exception 
>> of the "default schema name" which will have significance for table 
>> reflection operation. if you aren't using table reflection then you could in 
>> theory share the dialect.
>> 
>> I would likely look first to seeing if there is a way to use a single engine 
>> per host, and then to use the schema translation feature so that the full 
>> set of shards per host are available under one engine: 
>> https://docs.sqlalchemy.org/en/13/core/connections.html?highlight=schema_translate_map#schema-translating
>>  .
>> 
>> This is what you should likely be doing in any case as it would allow you to 
>> have a single connection pool that is shared for all shards on a host.
>> 
>> 
>> 
>> 
>> 
>>> 
>>> 
>>> 

>>> --
>>> SQLAlchemy - 
>>> The Python SQL Toolkit and Object Relational Mapper
>>> 
>>> http://www.sqlalchemy.org/
>>> 
>>> To post example code, please provide an MCVE: Minimal, Complete, and 
>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>> description.
>>> --- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to sqlal...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/sqlalchemy/d1e7f3ce-4cd8-4d9d-b774-44175a4e523b%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/sqlalchemy/d1e7f3ce-4cd8-4d9d-b774-44175a4e523b%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> 
> 

> --
>  SQLAlchemy - 
>  The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
>  To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
>  --- 
>  You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
>  To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
>  To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/65159d52-9ea7-4e35-96b8-d0215530939d%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/65159d52-9ea7-4e35-96b8-d0215530939d%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/9bd79150-32e6-4729-bdce-b77f7a49cb81%40www.fastmail.com.

Re: [sqlalchemy] Performance issues with multi-shard (engines) sharing the same bakery / LRUCache

Reply via email to