Hello! I want to cache query results in a distributed cache (Redis), so I need to serialize/deserialize fetched entities very fast. However, it turned that SQLAlchemy entities are very heavy for pickle to dump/load. Regular classes and dicts with the same structure takes significantly less amount of time to reconstruct from bytes. It's even slower than fetching and mapping the entities from a database.
Here are benchmark results, loading 10000 simple entities: class Entity(Base): __tablename__ = 'test_entity' id = sa.Column(sa.Integer, primary_key=True, autoincrement=True) field1 = sa.Column(sa.String) field2 = sa.Column(sa.Integer) Name (time in ms) Median StdDev Dump bytes -------------------------------------------------------------------------------------------------------- test_unpickle_dicts 2.3764 (1.0) 0.0797 (1.09) 337806 test_unpickle_dataclasses 9.2546 (3.89) 0.0734 (1.0) 437861 test_unpickle_classes 9.4188 (3.96) 0.1230 (1.68) 437862 test_load_from_database 91.3482 (38.44) 0.6686 (9.10) test_unpickle_unsaved_entities 108.6726 (45.73) 0.5154 (7.02) 1448169 test_unpickle_persisted_entities 166.4531 (70.04) 0.3787 (5.16) 1697849 --------------------------------------------------------------------------------------------------------- *Environment*: OS: Ubuntu 16.04 Python: CPython 3.6.3 SQLAlchemy: 1.3.0 Database: in-memory SQLite, PostgreSQL (all gives similar results) The full benchmark code is in a single attachment file. To run: pytest test_ser.py --benchmark-columns=median,stddev --benchmark-disable-gc Requirements: pip install sqlalchemy==1.3.0 pytest==3.6.0 pytest-benchmark==3.1.1 dataclasses==0.6 *Questions:* 1) What strategy for caching I should consider while using SQLAlchemy? Currently, the only option I see is to have a duplicated declaration of entities in a form of simple classes and use it when I don't need modification. Needles to say, it's a lot of code duplication. 2) Is it possible to have a read-only loader in SQLAlchemy? I can imagine mapping loaded data to an entity class but without instrumentation overhead which is not needed to read the data. If continue the idea further, I can imagine a situation when I load the entity for modification, and it has a read-only relationship to another entity. So that I don't pay for things I don't use. Probably I'm making some crazy things and all wrong, feel free to point me out. My ultimate goal is to manage caching with SQLAlchemy and understand SQLAlchemy more in this direction. Thank you in advance! -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/75aa92b7-3c94-4b5a-afed-062a203792fd%40googlegroups.com.
import pytest from dataclasses import dataclass from sqlalchemy.ext.declarative import declarative_base import pickle import sqlalchemy as sa Base = declarative_base() class Entity(Base): __tablename__ = 'test_entity' id = sa.Column(sa.Integer, primary_key=True, autoincrement=True) field1 = sa.Column(sa.String) field2 = sa.Column(sa.Integer) def __init__(self, id_: int, field1: str, field2: int): self.id = id_ self.field1 = field1 self.field2 = field2 def some_method(self) -> str: return self.field1 + str(self.field2) @pytest.fixture(scope='session') def dbengine(): engine = sa.create_engine('sqlite://') Base.metadata.create_all(engine) yield engine @pytest.fixture(scope='function') def session(dbengine): session_factory = sa.orm.sessionmaker(bind=dbengine) Session = sa.orm.scoped_session(session_factory) yield Session Session.query(Entity).delete() Session.remove() @dataclass class EntityData: id: int field1: str field2: int def some_method(self) -> str: return self.field1 + str(self.field2) class EntityClass: def __init__(self, id_: int, field1: str, field2: int): self.id = id_ self.field1 = field1 self.field2 = field2 def some_method(self) -> str: return self.field1 + str(self.field2) def generate_data(): for i in range(10000): yield i + 1, str(i), i * 2 def test_unpickle_unsaved_entities(benchmark): entities = [Entity(*data) for data in generate_data()] dump_unsaved_entities = pickle.dumps(entities) print(f'\nDump unsaved entities bytes {len(dump_unsaved_entities)}') benchmark(pickle.loads, dump_unsaved_entities) def test_unpickle_persisted_entities(session, benchmark): entities = [Entity(*data) for data in generate_data()] session.add_all(entities) session.flush() session.expunge_all() entities = session.query(Entity).all() dump_persisted_entities = pickle.dumps(entities) print(f'\nDump persisted entities bytes {len(dump_persisted_entities)}') benchmark(pickle.loads, dump_persisted_entities) def test_unpickle_dicts(benchmark): dicts = [ {'id': id_, 'field1': field1, 'field2': field2} for id_, field1, field2 in generate_data() ] dump_dicts = pickle.dumps(dicts) print(f'\nDump dicts bytes {len(dump_dicts)}') benchmark(pickle.loads, dump_dicts) def test_unpickle_dataclasses(benchmark): dataclasses = [EntityData(*data) for data in generate_data()] dump_dataclasses = pickle.dumps(dataclasses) print(f'\nDump dataclasses bytes {len(dump_dataclasses)}') benchmark(pickle.loads, dump_dataclasses) def test_unpickle_classes(benchmark): instances = [EntityClass(*data) for data in generate_data()] dump_classes = pickle.dumps(instances) print(f'\nDump classes bytes {len(dump_classes)}') benchmark(pickle.loads, dump_classes) def test_load_from_database(session, benchmark): entities = [Entity(*data) for data in generate_data()] session.add_all(entities) session.flush() def load_from_db(): session.expunge_all() loaded_entities = session.query(Entity).all() assert len(loaded_entities) == len(entities) benchmark(load_from_db)