[sqlalchemy] SA Performance (some profiling on simple selects)

Henk Fri, 11 Jul 2008 07:22:35 -0700

Hi

I did some profiling a simple select type query to check SA
performance for such use-cases


The test consiste of a very simple Order record:

order = Table('tblorder', metadata,
    Column('order_id', Integer, primary_key = True),
    Column('order_user_id', Integer),
    Column('order_state', Integer),
    Column('order_price', Integer),
    Column('order_delivery_cost', Integer),
    Column('order_creation_date', DateTime),
    Column('order_modification_date', DateTime),
    Column('order_last_payment_check', DateTime),
    Column('order_delivery_address_id', Integer),
)

and some related log lines that keep track of what happens to the
order:

orderlog = Table('tblorderlog', metadata,
    Column('orderlog_id', Integer, primary_key = True),
    Column('orderlog_order_id', Integer,
ForeignKey('tblorder.order_id')),
    Column('orderlog_creation_date', DateTime),
    Column('orderlog_message', String),
    Column('orderlog_priority', Integer),
)

On the ORM side of things these 2 have a simple 1-1 mapping with
classes Order and OrderLog and
the loglines are mapped to the order using:

'orderLogs': relation(OrderLog, backref='order')

The main loop of the test program is the following:

        times = 1000
        start = time.time()
        for i in range(times):
            order = Session.query(Order).get(72244)

            for log in order.orderLogs:
                pass

            Session.clear()

        end = time.time()

Note that it is the same Order that I am fetching all the time,
because I am interested in SA overhead and not in the speed of the
database (MySQL) itself, this way mysql will fetch from memory and not
be disk bound (also in this test I made sure I am not network bound).
Also note that the session is cleared each iteration trough the loop
to force SA to perform the 2 queries and corresponding mapping
overhead (1 qry for fetching order, 1 qry for fetching the 17
corresponding orderlog lines) on each iteration.

Profiling with cProfile (python 2.5.1) gave the following results
(sorted by total time spend in function):

        2923724 function calls (2880671 primitive calls) in 5.965 CPU
seconds

   Ordered by: internal time
   List reduced from 498 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall
filename:lineno(function)
     2000    0.585    0.000    0.585    0.000 {method 'query' of
'_mysql.connection' objects}
    18000    0.192    0.000    1.715    0.000 mapper.py:
1407(_instance)
    18000    0.160    0.000    0.801    0.000 mapper.py:
1381(populate_state)
     2000    0.153    0.000    0.259    0.000 base.py:
1448(_init_metadata)
    20000    0.153    0.000    0.198    0.000 times.py:
43(DateTime_or_None)
    36000    0.147    0.000    0.147    0.000 identity.py:
91(__contains__)
    16000    0.139    0.000    0.270    0.000 compiler.py:
239(visit_column)
   112000    0.130    0.000    0.240    0.000 base.py:
1363(__getitem__)
    94000    0.127    0.000    0.318    0.000 strategies.py:
71(new_execute)
    18000    0.122    0.000    0.122    0.000 attributes.py:
745(__init__)
42000/2000    0.114    0.000    1.233    0.001 compiler.py:
176(process)
162036/162022    0.113    0.000    0.134    0.000 {getattr}
    14000    0.106    0.000    0.541    0.000 compiler.py:
228(visit_label)
     2000    0.103    0.000    1.222    0.001 compiler.py:
466(visit_select)
   112000    0.103    0.000    0.110    0.000 base.py:1617(_get_col)
    32000    0.089    0.000    0.158    0.000 compiler.py:
401(_truncated_identifier)
    72000    0.086    0.000    0.096    0.000 util.py:806(add)
   135101    0.085    0.000    0.085    0.000 {isinstance}
    18000    0.073    0.000    0.181    0.000 identity.py:104(add)
    46000    0.068    0.000    0.080    0.000 {built-in method sub}

What we see here is that  of the total of 5.965 seconds, the test
spend only about 0.585 seconds performing the actual SQL querys, or
about 10% of the total time. The other 90% of the time we are spending
on SA overhead (except for the times.py line which is somewhere in the
MySQLdb driver).

If I look at the cummulative output of the profiler (ommitted for
brevity) I see for instance that SA spends 1.3 seconds compiling the
SQL expressions (it apparently recompiles the same SQL expression on
each iteration).

My question to the SA community would be how much ORM overhead do you
find acceptable? and how does SA compare to for instance to java's
Hibernate in this regard or other ORM solutions?

I would love to help out trying to reduce this overhead, but I have no
idea what the SA devs have in mind for the future of SA with regard to
speed (e.g. speed vs. feature completeness). In the tutorials and
documentation there is currently a strong emphasis on use cases
highlighting the very dynamic nature of building complex queries using
SA's ORM tools. Not so much on how to perform simple (and static)
queries fast using the ORM.

Maybe there is a way to meoize the results of the compile step so that
this does not need to be redone all the time while the Query remains
the same?.











--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] SA Performance (some profiling on simple selects)

Reply via email to