On Mon, Jul 6, 2020, at 2:14 PM, Saylee M. wrote: > Hello all, > Hope you are fine and safe in these times! > > I can be easily considered as a novice in SQLAlchemy. > I am trying to pull data from a MYSQL database from a table having around 20 > columns and more than 10 million rows. > The table is partitioned and indexed. I am using a complex query, having > UNION of multiple sub-queries, in the format: > > with cte1 as <subquery1>, > cte2 as <subquery2>, > .. > select * from cte1 > union all > select * from cte2 > ; > > The code demands it to be a dynamic query and number of sub-queries depends > on the user input. > For understanding purpose, I took a query having two sub-queries. > So, when I passed the query to MySQL directly, it took very less time (around > 0.016 seconds) but when I passed the same > query through SQLAlchemy connector, it took around 600 seconds.
hi - when you say "mysql directly" what are you refering to, your command line client, or the Python driver? Also what python driver are you using? A comparison for how much time the operation needs to take should be produced by using the Python driver that you have with the *exact* SQL query you're starting with, then fully fetching all rows in Python. It sounds a little off that the mysqlclient driver can fetch ten million rows in 0.016 seconds. at the end of this message is a test script that fetches 5 million rows with just two columns each, using the raw mysqlclient driver. mysqlclient is a native driver that is as fast as you can get. I'm running it on my laptop with *no network*, local MySQL server, and it takes 7 seconds to fetch that many rows. ten million rows is a lot and it's not going to be very fast in any case. That said, the ORM when it fetches full Python objects is *much* slower than fetching rows, because there is a lot of Python overhead in building up the objects. if you have 10M rows, just for Python to set aside the memory to store 10M heavy Python objects at once will take many seconds, and once it churns into swap space the time will grow exponentially. It's unlikely you need the full blown business object functionality on 10M rows at once so I would fetch columns instead of objects. There's discussion about this in the FAQ at https://docs.sqlalchemy.org/en/13/faq/performance.html#result-fetching-slowness-orm . Additionally, the ORM can yield out the ORM objects in batches, keeping in mind the database driver has probably buffered the raw rows in any case, using yield_per(). What I would suggest is take a look at all the example fetching suites at https://docs.sqlalchemy.org/en/13/_modules/examples/performance/large_resultsets.html . This illustrates all the different ways you can fetch rows with SQLAlchemy and compares the speed of each. you can run this suite straight from the distribution, and it will show the relative differences in speed between different kinds of fetches. The ORM in particular has a lot of work to do in both generating Python objects and populating them, and if you fetch rows with columns instead of objects, that will save most of the time. To get a feel for this suite, here's a run from my own laptop: $ python -m examples.performance large_resultsets --dburl mysql://scott:tiger@localhost/test Running setup once... Tests to run: test_orm_full_objects_list, test_orm_full_objects_chunks, test_orm_bundles, test_orm_columns, test_core_fetchall, test_core_fetchmany_w_streaming, test_core_fetchmany, test_dbapi_fetchall_plus_append_objects, test_dbapi_fetchall_no_object test_orm_full_objects_list : Load fully tracked ORM objects into one big list(). (500000 iterations); total time 6.135940 sec test_orm_full_objects_chunks : Load fully tracked ORM objects a chunk at a time using yield_per(). (500000 iterations); total time 3.340366 sec test_orm_bundles : Load lightweight "bundle" objects using the ORM. (500000 iterations); total time 0.949388 sec test_orm_columns : Load individual columns into named tuples using the ORM. (500000 iterations); total time 0.560157 sec test_core_fetchall : Load Core result rows using fetchall. (500000 iterations); total time 0.466407 sec test_core_fetchmany_w_streaming : Load Core result rows using fetchmany/streaming. (500000 iterations); total time 0.339930 sec test_core_fetchmany : Load Core result rows using Core / fetchmany. (500000 iterations); total time 0.470984 sec test_dbapi_fetchall_plus_append_objects : Load rows using DBAPI fetchall(), generate an object for each row. (500000 iterations); total time 0.476398 sec test_dbapi_fetchall_no_object : Load rows using DBAPI fetchall(), don't make any objects. (500000 iterations); total time 0.330984 sec import random import time from sqlalchemy import create_engine e = create_engine("mysql://scott:tiger@localhost/test") with e.connect() as conn: conn.execute("drop table if exists data") conn.execute("create table data (x integer, y integer)") conn.execute( "insert into data (x, y) values (%s, %s)", [ (random.randint(1, 10), random.randint(1, 10)) for i in range(5000000) ], ) raw_mysql_connection = conn.connection.connection cursor = raw_mysql_connection.cursor() now = time.perf_counter() cursor.execute("select x, y from data") cursor.fetchall() cursor.close() total = time.perf_counter() - now print("total time %s" % total) > I tried generating the query using only text clause in one approach and I > also tried combination of .select(), .where() and .and_ query approach. > However, both were taking a very long time. > > I am not sure what can be issue. It'll be great if I can get any pointers to > reduce the time, preferably under 10 seconds! > > Thanks in advance! > > Regards, > Saylee > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sqlalchemy/f7b03dcd-dd38-4df1-8719-a7902b1a39d4o%40googlegroups.com > > <https://groups.google.com/d/msgid/sqlalchemy/f7b03dcd-dd38-4df1-8719-a7902b1a39d4o%40googlegroups.com?utm_medium=email&utm_source=footer>. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/5fc48211-9854-4324-938e-86dfd0091e5e%40www.fastmail.com.