Re: [sqlalchemy] simple query takes to long

'Jonathan Vanasco' via sqlalchemy Wed, 08 Jun 2022 08:28:06 -0700

When you select in the database ui tool, you are just displaying raw data.

When you select within your code snippets above, Python is creating pandas' 
DataFrame objects for the results.


These two concepts are not comparable at all.  Converting the SQL data to 
Python data structures in Pandas (and SQLAlchemy's ORM) is a lot of 
overhead - and that grows with the result size.

You can use memory and code profiling tools to explore this and see where 
the issues are. The best approach is what Philip suggested above though, 
and not use pandas, so you can see how Python/SqlAlchemy handles the raw 
data.





On Wednesday, June 8, 2022 at 9:28:38 AM UTC-4 Trainer Go wrote:

> Hello Phil,
>
> i tested both and without printing the result.
>
> table_df = pd.read_sql_query(''SELECT, engine)
> #print(table_df)
> #query = "SELECT"
> #for row in conn.execute(query).fetchall():
> #    pass
>
>
> both have nearly the same runtime. So this is not my problem. And yes, 
> they are the same queries cause i copy pasted the select from my DBUI where 
> is tested first the results and the runtime and i expected the same runtime 
> in my program but no ;)
>
> Greeting Manuel
>
> Philip Semanchuk schrieb am Mittwoch, 8. Juni 2022 um 15:04:08 UTC+2:
>
>>
>>
>> > On Jun 8, 2022, at 8:29 AM, Trainer Go <maego...@gmail.com> wrote: 
>> > 
>> > When im using pandas with pd.read_sql_query() 
>> > with chunksize to minimiza the memory usage there is no difference 
>> between both runtimes.. 
>>
>> Do you know that, or is that speculation? 
>>
>> > 
>> > table_df = pd.read_sql_query('''select , engine, chunksize = 30000) 
>> > 
>> > for df in table_df: 
>> > print(df) 
>> > 
>> > the runtime is nearly the same like 5 minutes 
>>
>> Printing to the screen also takes time, and your terminal probably 
>> buffers the results, which requires memory allocation. I’m not saying this 
>> is your problem (it probably isn’t), but your test still involves pandas 
>> and your terminal, both of which cloud the issue. You would benefit from 
>> simplifying your tests. 
>>
>> Did you try this suggestion from my previous email? 
>>
>>
>> > for row in conn.execute(my_query).fetchall(): 
>> > pass 
>>
>> Also, are you 100% sure you’re executing the same query from SQLAlchemy 
>> that you’re pasting into your DB UI? 
>>
>> Cheers 
>> Philip 
>>
>>
>>
>> > 
>> > 
>> > 
>> > #print(table_df) result: #generator object SQLDatabase._query_iterator 
>> at 0x0DC69C30> 
>> > I dont know if the query will be triggered by using print(table_df) the 
>> result is generator object SQLDatabase._query_iterator at 0x0DC69C30> 
>> > 
>> > but the runtime is 6 seconds like in the DBUI im using. 
>> > 
>> > I have no clue what to do. 
>> > 
>> > Greetings Manuel 
>> > 
>> > Trainer Go schrieb am Mittwoch, 8. Juni 2022 um 09:27:04 UTC+2: 
>> > thank you Philip, 
>> > 
>> > I will test it today. 
>> > 
>> > 
>> > Greetings Manuel 
>> > 
>> > Philip Semanchuk schrieb am Dienstag, 7. Juni 2022 um 17:13:28 UTC+2: 
>> > 
>> > 
>> > > On Jun 7, 2022, at 5:46 AM, Trainer Go <maego...@gmail.com> wrote: 
>> > > 
>> > > Hello guys, 
>> > > 
>> > > Im executing 2 queries in my python program with sqlalchemy using the 
>> pyodbc driver. 
>> > > The database is a Adaptive SQL Anywhere Version 7 32 Bit. 
>> > > 
>> > > When im executing the queries in a DB UI it takes 5-6 seconds for 
>> both together and when im using the same queries in my python programm it 
>> takes 5-6 minutes instead of 6 seconds. What im doing wrong? Im new at 
>> this. 
>> > 
>> > To start, debug one query at a time, not two. 
>> > 
>> > Second, when you test a query in your DB UI, you’re probably already 
>> connected to the database. Your Python program has to make the connection — 
>> that’s an extra step, and it might be slow. If you step through the Python 
>> program in the debugger, you can execute one statement at a time (the 
>> connection and the query) to understand how long each step takes. That will 
>> help to isolate the problem. 
>> > 
>> > Third, keep in mind that receiving results takes time too. If your DB 
>> UI is written in C or some other language that allocates memory very 
>> efficiently, it might be a lot faster than building a Pandas dataframe. 
>> > 
>> > You might want to eliminate Pandas entirely so you don’t have to 
>> question whether or not that’s the source of your slowdown. You could do 
>> this instead - 
>> > 
>> > for row in conn.execute(my_query).fetchall(): 
>> > pass 
>> > 
>> > That will force your Python program to iterate over the result set 
>> without being forced to allocate memory for all the results. 
>> > 
>> > Hope this helps 
>> > Philip 
>>
>>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/d9f2fcd2-1ccf-44da-b71d-9e646065aa6an%40googlegroups.com.

Re: [sqlalchemy] simple query takes to long

Reply via email to