Re: [sqlalchemy] More efficient Insert mechanism

Michael Bayer Tue, 02 Feb 2010 08:44:47 -0800

Mati Skiva wrote:
> I believe I took these issues into account.
> But just to be sure, maybe you can highlight something for me.
>
> I assume the followings about the process of session.add:
> * after session.add is called, the objects are placed in a to-do pool
> (maybe)
> * for self-generating-id objects, a special process is used, one that
> fetches the generated id after the insert
> * for no self-generating-id objects, a regular process of insert is used
>
> I came to this conclusion, because otherwise, after each insert all the
> data of the row is retrieved and placed inside the object. Which covers
> the newly generated id.


Since you're interested in hacking SQLAlchemy internals, it would helpful
if you could familiarize yourself fully with the SQL expression and
execution API, all of which is public and documented.   "id generation" is
a standard feature of this system and occurs outside of the ORM.  Fetching
the ID for a new row with the SQLA expression API is a matter of calling a
ResultProxy method which the ORM only calls if it doesn't already know the
full set of primary key columns for a specific object.   The backend
implementation of "id fetching" varies wildly across all dialects as well
as versions of each database in use.

The flush procedure updates or expires those fields of the object which
were known to be generated by the database.   Fields which are immediately
updated include some or all of the primary key columns.  Fields which are
expired include any other columns which were generated via server-side SQL
expressions or known schema-level defaults - these are fetched when the
object's expired attributes are next accessed, if at all.

The SQL compiler procedure is aware of these columns when it generates an
individual insert() construct, given a list of parameters which are to be
bound literals and others which are to be embedded SQL expressions.   When
executed, the ResultProxy API provides these details fully - you should
familiarize yourself with all of its methods.

I would also point you to http://www.sqlalchemy.org/trac/ticket/1518 ,
which suggests that we should open up Mapper to being subclassable (for
much more exotic purposes than what you have here).  However, this
subclassing is possible now, which is where you can have your "insertmany"
functionality rolled in a rudimental way for your immediate use case
without burdening SQLAlchemy core with the need to have a fully
tested/supported/performant feature.   See the attached example.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

from sqlalchemy.orm.mapper import Mapper

class OptimizedInsertMapper(Mapper):
    
    def _save_obj(self, states, uowtransaction, postupdate=False, post_update_cols=None, single=False):
        # determine which of the incoming "states" are inserts
        inserts = [s for s in states if not s.key]
        
        # split off into inserts, updates
        states = list(set(states).difference(inserts))

        # get max ID of the table
        max_id = uowtransaction.session.scalar(select([func.max(self.primary_key[0])]))
        if not max_id:
            max_id = 0
            
        # insertmany().   assume the incoming states are all of the same format.
        uowtransaction.session.execute(
            self.local_table.insert(),
            [s.dict for s in inserts]
        )
        
        # guess ids - establish state on the objects.
        for i, s in enumerate(inserts):
            s.dict['id'] = max_id + i + 1

        # send the updates to the regular _save_obj
        Mapper._save_obj(self, states, uowtransaction, postupdate=postupdate, post_update_cols=post_update_cols, single=single)

if __name__ == '__main__':
    from sqlalchemy import *
    from sqlalchemy.orm  import *
    from sqlalchemy.ext.declarative import declarative_base

    Base = declarative_base(mapper=OptimizedInsertMapper)

    class Widget(Base):
        __tablename__ = 'widget'
    
        id = Column(Integer, primary_key=True)
        data = Column(String)
        related = relation("Related")
    
        def __init__(self, data):
            self.data = data
            self.related = [Related() for i in xrange(5)]
        
    class Related(Base):
        __tablename__ = 'related'
    
        id = Column(Integer, primary_key=True)
        widget_id = Column(Integer, ForeignKey('widget.id'))

    e = create_engine('sqlite://', echo=True)
    Base.metadata.create_all(e)

    sess = sessionmaker(e)()

    widgets = [Widget(data="widget %d" % i) for i in xrange(10)]
    sess.add_all(widgets)
    sess.commit()

    # assert everything was generated
    for w in widgets:
        assert w.id
        for r in w.related:
            assert r.id
            assert r.widget_id == w.id
        assert w in sess

    w1 = sess.query(Widget).filter_by(data="widget 5").one()

    w1.data = 'widget 5 modified'
    sess.add_all([Widget(data="widget %d" % i) for i in range(11, 20)])
    sess.commit()

Re: [sqlalchemy] More efficient Insert mechanism

Reply via email to