On Mon, Jul 6, 2020, at 2:14 PM, Saylee M. wrote:
> Hello all, 
> Hope you are fine and safe in these times!
> 
> I can be easily considered as a novice in SQLAlchemy.
> I am trying to pull data from a MYSQL database from a table having around 20 
> columns and more than 10 million rows. 
> The table is partitioned and indexed. I am using a complex query, having 
> UNION of multiple sub-queries, in the format:
> 
> with cte1 as <subquery1>, 
> cte2 as <subquery2>, 
> .. 
> select * from cte1
> union all
> select * from cte2
> ;
> 
> The code demands it to be a dynamic query and number of sub-queries depends 
> on the user input. 
> For understanding purpose, I took a query having two sub-queries. 
> So, when I passed the query to MySQL directly, it took very less time (around 
> 0.016 seconds) but when I passed the same 
> query through SQLAlchemy connector, it took around 600 seconds.

hi -

when you say "mysql directly" what are you refering to, your command line 
client, or the Python driver? Also what python driver are you using? 

A comparison for how much time the operation needs to take should be produced 
by using the Python driver that you have with the *exact* SQL query you're 
starting with, then fully fetching all rows in Python. It sounds a little off 
that the mysqlclient driver can fetch ten million rows in 0.016 seconds. at the 
end of this message is a test script that fetches 5 million rows with just two 
columns each, using the raw mysqlclient driver. mysqlclient is a native driver 
that is as fast as you can get. I'm running it on my laptop with *no network*, 
local MySQL server, and it takes 7 seconds to fetch that many rows. ten million 
rows is a lot and it's not going to be very fast in any case. 

That said, the ORM when it fetches full Python objects is *much* slower than 
fetching rows, because there is a lot of Python overhead in building up the 
objects. if you have 10M rows, just for Python to set aside the memory to store 
10M heavy Python objects at once will take many seconds, and once it churns 
into swap space the time will grow exponentially. It's unlikely you need the 
full blown business object functionality on 10M rows at once so I would fetch 
columns instead of objects. There's discussion about this in the FAQ at 
https://docs.sqlalchemy.org/en/13/faq/performance.html#result-fetching-slowness-orm
 . Additionally, the ORM can yield out the ORM objects in batches, keeping in 
mind the database driver has probably buffered the raw rows in any case, using 
yield_per().

What I would suggest is take a look at all the example fetching suites at 
https://docs.sqlalchemy.org/en/13/_modules/examples/performance/large_resultsets.html
 . This illustrates all the different ways you can fetch rows with SQLAlchemy 
and compares the speed of each. you can run this suite straight from the 
distribution, and it will show the relative differences in speed between 
different kinds of fetches. The ORM in particular has a lot of work to do in 
both generating Python objects and populating them, and if you fetch rows with 
columns instead of objects, that will save most of the time.

To get a feel for this suite, here's a run from my own laptop:

$ python -m examples.performance large_resultsets --dburl 
mysql://scott:tiger@localhost/test
Running setup once...
Tests to run: test_orm_full_objects_list, test_orm_full_objects_chunks, 
test_orm_bundles, test_orm_columns, test_core_fetchall, 
test_core_fetchmany_w_streaming, test_core_fetchmany, 
test_dbapi_fetchall_plus_append_objects, test_dbapi_fetchall_no_object
test_orm_full_objects_list : Load fully tracked ORM objects into one big 
list(). (500000 iterations); total time 6.135940 sec
test_orm_full_objects_chunks : Load fully tracked ORM objects a chunk at a time 
using yield_per(). (500000 iterations); total time 3.340366 sec
test_orm_bundles : Load lightweight "bundle" objects using the ORM. (500000 
iterations); total time 0.949388 sec
test_orm_columns : Load individual columns into named tuples using the ORM. 
(500000 iterations); total time 0.560157 sec
test_core_fetchall : Load Core result rows using fetchall. (500000 iterations); 
total time 0.466407 sec
test_core_fetchmany_w_streaming : Load Core result rows using 
fetchmany/streaming. (500000 iterations); total time 0.339930 sec
test_core_fetchmany : Load Core result rows using Core / fetchmany. (500000 
iterations); total time 0.470984 sec
test_dbapi_fetchall_plus_append_objects : Load rows using DBAPI fetchall(), 
generate an object for each row. (500000 iterations); total time 0.476398 sec
test_dbapi_fetchall_no_object : Load rows using DBAPI fetchall(), don't make 
any objects. (500000 iterations); total time 0.330984 sec





import random
import time

from sqlalchemy import create_engine

e = create_engine("mysql://scott:tiger@localhost/test")

with e.connect() as conn:
 conn.execute("drop table if exists data")
 conn.execute("create table data (x integer, y integer)")
 conn.execute(
 "insert into data (x, y) values (%s, %s)",
 [
 (random.randint(1, 10), random.randint(1, 10))
 for i in range(5000000)
 ],
 )

 raw_mysql_connection = conn.connection.connection

 cursor = raw_mysql_connection.cursor()

 now = time.perf_counter()
 cursor.execute("select x, y from data")
 cursor.fetchall()
 cursor.close()
 total = time.perf_counter() - now
 print("total time %s" % total)




> I tried generating the query using only text clause in one approach and I 
> also tried combination of .select(), .where() and .and_ query approach.
> However, both were taking a very long time.
> 
> I am not sure what can be issue. It'll be great if I can get any pointers to 
> reduce the time, preferably under 10 seconds!
> 
> Thanks in advance!
> 
> Regards,
> Saylee
> 

> --
>  SQLAlchemy - 
>  The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
>  To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
>  --- 
>  You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
>  To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
>  To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/f7b03dcd-dd38-4df1-8719-a7902b1a39d4o%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/f7b03dcd-dd38-4df1-8719-a7902b1a39d4o%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/5fc48211-9854-4324-938e-86dfd0091e5e%40www.fastmail.com.

Reply via email to