Re: [sqlalchemy] Why is SQLAlchemy insert with sqlite 25 times slower than using sqlite3 directly?

Michael Bayer Wed, 01 Aug 2012 18:18:59 -0700

Here is my answer reposted from http://stackoverflow.com/a/11769768/34549.


The SQLAlchemy ORM uses the unit of work pattern when synchronizing changes to 
the database. This pattern goes far beyond simple "inserts" of data. It 
includes that attributes which are assigned on objects are received using an 
attribute instrumentation system which tracks changes on objects as they are 
made, includes that all rows inserted are tracked in an identity map which has 
the effect that for each row SQLAlchemy must retrieve its "last inserted id" if 
not already given, and also involves that rows to be inserted are scanned and 
sorted for dependencies as needed. Objects are also subject to a fair degree of 
bookkeeping in order to keep all of this running, which for a very large number 
of rows at once can create  an inordinate amount of time spent with large data 
structures, hence it's best to chunk these.

Basically, unit of work is a large degree of automation in order to automate 
the task of persisting a complex object graph into a relational database with 
no explicit persistence code, and this automation has a price.

So ORMs are basically not intended for high-performance bulk inserts. This is 
the whole reason why SQLAlchemy has two separate libraries, which you'll note 
if you look at http://docs.sqlalchemy.org/en/rel_0_7/index.html you'll see two 
distinct halves to the index page - one for the ORM and one for the Core. You 
cannot use SQLAlchemy effectively without understanding both.

For the use case of fast bulk inserts, SQLAlchemy provides the core, which is 
the SQL generation and execution system that the ORM builds on top of. Using 
this system effectively we can produce an INSERT that is competitive with the 
raw SQLite version. The script below illustrates this, as well as an ORM 
version that pre-assigns primary key identifiers so that the ORM can use 
executemany() to insert rows. Both ORM versions chunk the flushes at 1000 
records at a time as well which has a significant performance impact.

Runtimes observed here are:

SqlAlchemy ORM: Total time for 100000 records 16.4133379459 secs
SqlAlchemy ORM pk given: Total time for 100000 records 9.77570986748 secs
SqlAlchemy Core: Total time for 100000 records 0.568737983704 secs
sqlite3: Total time for 100000 records 0.595796823502 sec

script:

import time
import sqlite3

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String,  create_engine
from sqlalchemy.orm import scoped_session, sessionmaker

Base = declarative_base()
DBSession = scoped_session(sessionmaker())

class Customer(Base):
    __tablename__ = "customer"
    id = Column(Integer, primary_key=True)
    name = Column(String(255))

def init_sqlalchemy(dbname = 'sqlite:///sqlalchemy.db'):
    global engine
    engine = create_engine(dbname, echo=False)
    DBSession.remove()
    DBSession.configure(bind=engine, autoflush=False, expire_on_commit=False)
    Base.metadata.drop_all(engine)
    Base.metadata.create_all(engine)

def test_sqlalchemy_orm(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    for i in range(n):
        customer = Customer()
        customer.name = 'NAME ' + str(i)
        DBSession.add(customer)
        if i % 1000 == 0:
            DBSession.flush()
    DBSession.commit()
    print "SqlAlchemy ORM: Total time for " + str(n) + " records " + 
str(time.time() - t0) + " secs"

def test_sqlalchemy_orm_pk_given(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    for i in range(n):
        customer = Customer(id=i+1, name="NAME " + str(i))
        DBSession.add(customer)
        if i % 1000 == 0:
            DBSession.flush()
    DBSession.commit()
    print "SqlAlchemy ORM pk given: Total time for " + str(n) + " records " + 
str(time.time() - t0) + " secs"

def test_sqlalchemy_core(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    engine.execute(
        Customer.__table__.insert(),
        [{"name":'NAME ' + str(i)} for i in range(n)]
    )
    print "SqlAlchemy Core: Total time for " + str(n) + " records " + 
str(time.time() - t0) + " secs"

def init_sqlite3(dbname):
    conn = sqlite3.connect(dbname)
    c = conn.cursor()
    c.execute("DROP TABLE IF EXISTS customer")
    c.execute("CREATE TABLE customer (id INTEGER NOT NULL, name VARCHAR(255), 
PRIMARY KEY(id))")
    conn.commit()
    return conn

def test_sqlite3(n=100000, dbname = 'sqlite3.db'):
    conn = init_sqlite3(dbname)
    c = conn.cursor()
    t0 = time.time()
    for i in range(n):
        row = ('NAME ' + str(i),)
        c.execute("INSERT INTO customer (name) VALUES (?)", row)
    conn.commit()
    print "sqlite3: Total time for " + str(n) + " records " + str(time.time() - 
t0) + " sec"

if __name__ == '__main__':
    test_sqlalchemy_orm(100000)
    test_sqlalchemy_orm_pk_given(100000)
    test_sqlalchemy_core(100000)
    test_sqlite3(100000)


On Aug 1, 2012, at 9:05 PM, Braddock Gaskill wrote:

> 
> Why is this simple test case inserting 100,000 rows 25 times slower with
> SQLAlchemy than it is using the sqlite3 driver directly? I have seen
> similar slowdowns in real-world applications. Am I doing something wrong?
> 
> I have asked this on StackOverflow.  The question with my exact code is
> at:
> 
> http://stackoverflow.com/questions/11769366/why-is-sqlalchemy-insert-with-sqlite-25-times-slower-than-using-sqlite3-directly
> 
> 
> 
> #!/usr/bin/env python
> # Whis is SQLAlchemy with SQLite so slow?
> # Output from this program:
> # SqlAlchemy: Total time for 100000 records 10.74 secs
> # sqlite3:    Total time for 100000 records  0.40 secs
> # By Braddock Gaskill 8/1/2012
> 
> import time
> import sqlite3
> 
> from sqlalchemy.ext.declarative import declarative_base
> from sqlalchemy import Column, Integer, String,  create_engine 
> from sqlalchemy.orm import scoped_session, sessionmaker
> 
> Base = declarative_base()
> DBSession = scoped_session(sessionmaker())
> 
> class Customer(Base):
>    __tablename__ = "customer"
>    id = Column(Integer, primary_key=True)
>    name = Column(String(255))
> 
> def init_sqlalchemy(dbname = 'sqlite:///sqlalchemy.db'):
>    engine  = create_engine(dbname, echo=False)
>    DBSession.configure(bind=engine, autoflush=False,
> expire_on_commit=False)
>    Base.metadata.drop_all(engine)
>    Base.metadata.create_all(engine)
> 
> def test_sqlalchemy(n=100000):
>    init_sqlalchemy()
>    t0 = time.time()
>    for i in range(n):
>        customer = Customer()
>        customer.name = 'NAME ' + str(i)
>        DBSession.add(customer)
>    DBSession.commit()
>    print "SqlAlchemy: Total time for " + str(n) + " records " +
> str(time.time() - t0) + " secs"
> 
> def test_sqlalchemy_insert(n=10000):
>    t0 = time.time()
>    for i in range(n):
>        ins = Customer.__table__.insert().values(name = 'NAME ' + str(i))
>        DBSession.execute(ins)
>    DBSession.commit()
>    print "SqlAlchemy Insert: Total time for " + str(n) + " records " +
> str(time.time() - t0) + " secs"
> 
> def init_sqlite3(dbname):
>    conn = sqlite3.connect(dbname)
>    c = conn.cursor()
>    c.execute("DROP TABLE IF EXISTS customer")
>    c.execute("CREATE TABLE customer (id INTEGER NOT NULL, name
> VARCHAR(255), PRIMARY KEY(id))")
>    conn.commit()
>    return conn
> 
> def test_sqlite3(n=100000, dbname = 'sqlite3.db'):
>    conn = init_sqlite3(dbname)
>    c = conn.cursor()
>    t0 = time.time()
>    for i in range(n):
>        row = ('NAME ' + str(i),)
>        c.execute("INSERT INTO customer (name) VALUES (?)", row)
>    conn.commit()
>    print "sqlite3: Total time for " + str(n) + " records " +
> str(time.time() - t0) + " sec"
> 
> if __name__ == '__main__':
>    test_sqlalchemy(100000)
>    test_sqlite3(100000)
>    test_sqlalchemy_insert(100000)
> 
> ~
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Why is SQLAlchemy insert with sqlite 25 times slower than using sqlite3 directly?

Reply via email to