Re: [sqlalchemy] Slow entities pickling for caching purposes

Александр Егоров Fri, 11 Oct 2019 02:23:15 -0700

Thank you very much for your answers!

Some more questions
1) I see you use sa.inspect(self).dict in __getstate__. Is it safe to 
replace it with self.__dict__? I didn't find any difference in the result, 
but __dict__ works 30 times faster.
2) I'm going to implement __getstate__ and __setstate__ in a Base class. Is 
it a risky idea from your point of view?
3) We've greatly optimized the unpickling for read-only entities. Is it 
possible to do the same during loading from the database? When I read data, 
I don't need ORM state, because I'm not going to modify the data and don't 
want lazy loads.


On Friday, 11 October 2019 00:34:03 UTC+7, Mike Bayer wrote:
>
>
>
> On Thu, Oct 10, 2019, at 2:48 AM, Александр Егоров wrote:
>
> Hello!
>
> I want to cache query results in a distributed cache (Redis), so I need to 
> serialize/deserialize fetched entities very fast.
> However, it turned that SQLAlchemy entities are very heavy for pickle to 
> dump/load. Regular classes and dicts with the same structure takes 
> significantly less amount of time to reconstruct from bytes.
> It's even slower than fetching and mapping the entities from a database.
>
> Here are benchmark results, loading 10000 simple entities:
>
> *class *Entity(Base):
>     __tablename__ = *'test_entity'*
>     id = sa.Column(sa.Integer, primary_key=*True*, autoincrement=*True*)
>     field1 = sa.Column(sa.String)
>     field2 = sa.Column(sa.Integer)
>
>
>
> Name (time in ms)                      Median            StdDev           
> Dump bytes
> --------------------------------------------------------------------------------------------------------
> test_unpickle_dicts                    2.3764 (1.0)      0.0797 (1.09)      
> 337806
> test_unpickle_dataclasses              9.2546 (3.89)     0.0734 (1.0)       
> 437861
> test_unpickle_classes                  9.4188 (3.96)     0.1230 (1.68)      
> 437862
> test_load_from_database               91.3482 (38.44)    0.6686 (9.10)
> test_unpickle_unsaved_entities       108.6726 (45.73)    0.5154 (7.02)     
> 1448169
> test_unpickle_persisted_entities     166.4531 (70.04)    0.3787 (5.16)     
> 1697849
> ---------------------------------------------------------------------------------------------------------
>
>
> *Environment*:
> OS: Ubuntu 16.04
> Python: CPython 3.6.3
> SQLAlchemy: 1.3.0
> Database: in-memory SQLite, PostgreSQL (all gives similar results)
> The full benchmark code is in a single attachment file.
> To run: 
> pytest test_ser.py --benchmark-columns=median,stddev --benchmark-disable-
> gc
> Requirements:
> pip install sqlalchemy==1.3.0 pytest==3.6.0 pytest-benchmark==3.1.1 
> dataclasses==0.6
>
> *Questions:*
> 1) What strategy for caching I should consider while using SQLAlchemy? 
> Currently, the only option I see is to have a duplicated declaration of 
> entities in a form of simple classes and use it when I don't need 
> modification. Needles to say, it's a lot of code duplication.
> 2) Is it possible to have a read-only loader in SQLAlchemy? I can imagine 
> mapping loaded data to an entity class but without instrumentation overhead 
> which is not needed to read the data.
> If continue the idea further, I can imagine a situation when I load the 
> entity for modification, and it has a read-only relationship to another 
> entity. So that I don't pay for things I don't use.
>
> Probably I'm making some crazy things and all wrong, feel free to point me 
> out. My ultimate goal is to manage caching with SQLAlchemy and understand 
> SQLAlchemy more in this direction.
> Thank you in advance! 
>
>
> Hi there -
>
> there's no simple answer for this, have you considered writing your own 
> serializer for your objects rather than using pickle's default strategy?  
> that is, if you implement __getstate__ and __setstate__,  you can allow the 
> serialization to not include the "_sa_instance_state" that's associated 
> with the objects which is where you'd get some of the slowness on dumping / 
> loading, it also would save lots of memory / network overhead by not 
> storing lots of data that you can probably forego for storing in your cache.
>
> For bringing the classes back,  you can implement a setstate that returns 
> them as non-ORM classes, or patches them in as ORM classes.
>
> I've altered your test to include only ORM mapped classes, comparing your 
> baseline to one that uses __getstate__, and then several variants which 
> will grant more or less ORM functionality once you have unpickled them.    
> the very fastest one which omits any ORM functionality at all, and looks 
> like what you propose, has a mean of 26 ms compared to 380 ms for the most 
> expensive.  
>
> there's one ORM-level guarantee you would want me to add for this most 
> optimized version, that is if you have an ORM object that has no 
> _sa_instance_state at all, you can still access attributes on it.  This 
> works but I would want to add test support so that I keep this as a 
> supported use case.
>
>
>
>
>
>
>
>
>
> --
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlal...@googlegroups.com <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/75aa92b7-3c94-4b5a-afed-062a203792fd%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/75aa92b7-3c94-4b5a-afed-062a203792fd%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
> *Attachments:*
>
>    - test_sqla_pickle.py
>    
>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/e6b0c770-21c4-4672-b886-4a75e97f94f2%40googlegroups.com.

Re: [sqlalchemy] Slow entities pickling for caching purposes

Reply via email to