Re: [sqlalchemy] Slow entities pickling for caching purposes

Alexander Egorov Fri, 11 Oct 2019 22:20:17 -0700

I've considered manual queries and building classes by myself. This option 
is good in terms of performance, but it requires a lot of code duplication 
and kills code reuse. So I want to use the same mapper classes for both 
read and modify operations. Also, I want to have the same query 
description. I've tried to build classes using Query.values and calling 
constructor and it gave ~3 times boost. What I want is to have the same 
functionality in generic form, without need to manually write these loops. 
Pseudocode to illustrate my probably crazy idea:
session.query(Entity).all()  # loads entities with modification tracking, 
with _sa_instance_state filled and added to the session.
session.query(Entity).readonly().all()  # loads entities without 
modification tracking, without _sa_instance_state and not added to the 
session.


# Probably it would be useful in cases when the Entity has a read-only link 
to another entity via relationship.
session.query(Entity).options(sa.orm.joinedload(Entity.another_entity, 
readonly_mapping=True)).get(123)

PyPy is an option I'm considering too. Its pickle module seems to be 
significantly slower, however. I need 3.6 version which is in beta now, so 
I have to wait.
Name (time in ms)                     PyPy 3.5           CPython 3.6.3
----------------------------------------------------------------------
test_unpickle_dicts                   12.2969 (1.0)      2.3764
test_load_from_database               26.4408 (2.15)    91.3482
test_unpickle_classes                 30.7339 (2.50)     9.4188
----------------------------------------------------------------------



пятница, 11 октября 2019 г., 20:52:57 UTC+7 пользователь Mike Bayer написал:
>
>
>
> On Fri, Oct 11, 2019, at 5:23 AM, Александр Егоров wrote:
>
> Thank you very much for your answers!
>
> Some more questions
> 1) I see you use sa.inspect(self).dict in __getstate__. Is it safe to 
> replace it with self.__dict__? I didn't find any difference in the result, 
> but __dict__ works 30 times faster.
>
>
> there's no difference.   there's an old extension that allows objects to 
> persist in some form other than __dict__ but I'm sure you're not using it.
>
>
> 2) I'm going to implement __getstate__ and __setstate__ in a Base class. 
> Is it a risky idea from your point of view?
>
>
> not really, pickle is already flaky enough
>
>
>
> 3) We've greatly optimized the unpickling for read-only entities. Is it 
> possible to do the same during loading from the database? When I read data, 
> I don't need ORM state, because I'm not going to modify the data and don't 
> want lazy loads.
>
>
> I would recommend using a plain result object from Query.statement or 
> Core.statement and build up the objects yourself in that case.     the "ORM 
> state" you refer towards is intrinsic to everything the ORM does including 
> all the relationship loading, etc.    you'd need to re-implement all of 
> that yourself.
>
> Since your performance requirements appear to be well beyond the norm have 
> you considered using pypy ?   This would appear to be a much simpler way to 
> double or triple your performance without building your own ORM.
>
>
>
> On Friday, 11 October 2019 00:34:03 UTC+7, Mike Bayer wrote:
>
>
>
> On Thu, Oct 10, 2019, at 2:48 AM, Александр Егоров wrote:
>
> Hello!
>
> I want to cache query results in a distributed cache (Redis), so I need to 
> serialize/deserialize fetched entities very fast.
> However, it turned that SQLAlchemy entities are very heavy for pickle to 
> dump/load. Regular classes and dicts with the same structure takes 
> significantly less amount of time to reconstruct from bytes.
> It's even slower than fetching and mapping the entities from a database.
>
> Here are benchmark results, loading 10000 simple entities:
>
> *class *Entity(Base):
>     __tablename__ = *'test_entity'*
>     id = sa.Column(sa.Integer, primary_key=*True*, autoincrement=*True*)
>     field1 = sa.Column(sa.String)
>     field2 = sa.Column(sa.Integer)
>
>
>
> Name (time in ms)                      Median            StdDev           
> Dump bytes
> --------------------------------------------------------------------------------------------------------
> test_unpickle_dicts                    2.3764 (1.0)      0.0797 (1.09)      
> 337806
> test_unpickle_dataclasses              9.2546 (3.89)     0.0734 (1.0)       
> 437861
> test_unpickle_classes                  9.4188 (3.96)     0.1230 (1.68)      
> 437862
> test_load_from_database               91.3482 (38.44)    0.6686 (9.10)
> test_unpickle_unsaved_entities       108.6726 (45.73)    0.5154 (7.02)     
> 1448169
> test_unpickle_persisted_entities     166.4531 (70.04)    0.3787 (5.16)     
> 1697849
> ---------------------------------------------------------------------------------------------------------
>
>
> *Environment*:
> OS: Ubuntu 16.04
> Python: CPython 3.6.3
> SQLAlchemy: 1.3.0
> Database: in-memory SQLite, PostgreSQL (all gives similar results)
> The full benchmark code is in a single attachment file.
> To run: 
> pytest test_ser.py --benchmark-columns=median,stddev --benchmark-disable-
> gc
> Requirements:
> pip install sqlalchemy==1.3.0 pytest==3.6.0 pytest-benchmark==3.1.1 
> dataclasses==0.6
>
> *Questions:*
> 1) What strategy for caching I should consider while using SQLAlchemy? 
> Currently, the only option I see is to have a duplicated declaration of 
> entities in a form of simple classes and use it when I don't need 
> modification. Needles to say, it's a lot of code duplication.
> 2) Is it possible to have a read-only loader in SQLAlchemy? I can imagine 
> mapping loaded data to an entity class but without instrumentation overhead 
> which is not needed to read the data.
> If continue the idea further, I can imagine a situation when I load the 
> entity for modification, and it has a read-only relationship to another 
> entity. So that I don't pay for things I don't use.
>
> Probably I'm making some crazy things and all wrong, feel free to point me 
> out. My ultimate goal is to manage caching with SQLAlchemy and understand 
> SQLAlchemy more in this direction.
> Thank you in advance! 
>
>
> Hi there -
>
> there's no simple answer for this, have you considered writing your own 
> serializer for your objects rather than using pickle's default strategy?  
> that is, if you implement __getstate__ and __setstate__,  you can allow the 
> serialization to not include the "_sa_instance_state" that's associated 
> with the objects which is where you'd get some of the slowness on dumping / 
> loading, it also would save lots of memory / network overhead by not 
> storing lots of data that you can probably forego for storing in your cache.
>
> For bringing the classes back,  you can implement a setstate that returns 
> them as non-ORM classes, or patches them in as ORM classes.
>
> I've altered your test to include only ORM mapped classes, comparing your 
> baseline to one that uses __getstate__, and then several variants which 
> will grant more or less ORM functionality once you have unpickled them.    
> the very fastest one which omits any ORM functionality at all, and looks 
> like what you propose, has a mean of 26 ms compared to 380 ms for the most 
> expensive.  
>
> there's one ORM-level guarantee you would want me to add for this most 
> optimized version, that is if you have an ORM object that has no 
> _sa_instance_state at all, you can still access attributes on it.  This 
> works but I would want to add test support so that I keep this as a 
> supported use case.
>
>
>
>
>
>
>
>
>
> --
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlal...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/75aa92b7-3c94-4b5a-afed-062a203792fd%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/75aa92b7-3c94-4b5a-afed-062a203792fd%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
> *Attachments:*
>
>    - test_sqla_pickle.py
>    
>
>
> --
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlal...@googlegroups.com <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/e6b0c770-21c4-4672-b886-4a75e97f94f2%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/e6b0c770-21c4-4672-b886-4a75e97f94f2%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/2d9f8f5c-2c0e-4702-ba61-1dc6cef58780%40googlegroups.com.

Re: [sqlalchemy] Slow entities pickling for caching purposes

Reply via email to