Hi there!

We are considering to use SQLAlchemy for a new project. Our first tests
look promising and it is a fun to use SA. But I still have a problem to
implement some special features we would like to have.

For example, I want to store large (tens to hundreds of MB) objects into
the database but keep the option open to store them into the filesystem
later. I would prefer storing them into the DB to have them under
transaction protection but OTOH I know that this can become a
performance problem. So I want to build an interface to allow external
(file or extra DB) storage later.

So instead of the blob itself I want to store a cryptographic hash (like
git, Mercurial, Fossil SCM etc. do) and index the real data from that.
If somebody tries to import the same file twice, it should just reuse
the existing blob (and possibly sanity check if the content matches).

The following example is a greatly simplified example of that approach.
It works like this, but I would like to do without the exception handler
at the end ;-)

------
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy
import hashlib

Base = declarative_base()

class CommonStorage(Base):
    __tablename__ = "common"
    hashval = sa.Column(sa.String, primary_key=True)
    value   = sa.Column(sa.LargeBinary)
    def __init__(self, v):
        self.value = v
        self.hashval = hashlib.md5(v).hexdigest()

class StorageUser(Base):
    __tablename__ = "user"
    id    = sa.Column(sa.Integer, primary_key=True)
    ref   = sa.Column(None, sa.ForeignKey(CommonStorage.hashval))
    rel   = orm.relation(CommonStorage)
    value = association_proxy("rel", "value")

engine = sa.create_engine("sqlite:///", echo=True)
Base.metadata.create_all(engine)
session = orm.sessionmaker(bind=engine, autoflush=False)()

ua, ub = StorageUser(), StorageUser()
ua.value = ub.value = "Something"
session.add(ua)
session.commit()
session.add(ub)
try:
    session.commit()
except sa.exc.FlushError:
    # I really like this error handling - we have to rollback first to
    # acknowledge the error. Cool! Never ignore errors and carry on again!
    session.rollback()
    ub.rel = session.query(CommonStorage).filter(ub.rel.hashval == 
CommonStorage.hashval).first()
    session.add(ub)
    session.commit()
-------

I tried using a MapperExtension on the StorageUser and to replace the
ref inside before_insert and before_update by searching for a matching
hash:

-----
class StorageExtension(interfaces.MapperExtension):
    def before_insert(self, mapper, connection, instance):
        if instance.rel != None:
            sess = orm.object_session(instance)
            existing = sess.query(CommonStorage).filter(instance.rel.hashval == 
CommonStorage.hashval).first()
            if existing != None:
                instance.rel = existing
    before_update = before_insert


 class StorageUser(Base):
     __tablename__ = "user"
+    __mapper_args__ = dict(extension=StorageExtension())
-----

While the extension gets called and tries to replace the relation, SA
still tries to insert the "new" entry.


Any way to get this implemented?

Greetings and thanks for any hint, Torsten

-- 
DYNAmore Gesellschaft fuer Ingenieurdienstleistungen mbH
Torsten Landschoff

Office Dresden
Tel: +49-(0)351-4519587
Fax: +49-(0)351-4519561

mailto:torsten.landsch...@dynamore.de
http://www.dynamore.de

Registergericht: Mannheim, HRB: 109659, Sitz: Karlsruhe,
Geschäftsführer:  Prof. Dr. K. Schweizerhof, Dipl.-Math. U. Franz

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to