Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

Michael Bayer Mon, 19 Jul 2010 09:01:45 -0700

On Jul 19, 2010, at 11:46 AM, Adrian Price-Whelan wrote:

>> does it take a few seconds to fully fetch all the results and it only gets 
>> 1000 rows ?  or is that just to get the initial result?
> 
> I'm not sure what you mean by this - the query does return 1000 rows.



if you run a query from a SQL console tool, and the query is to return hundreds 
of thousands or millions of results, you'll begin receiving rows immediately, 
and the SQL tool will display these rows within a fraction of a second. 
However, it could then take 20 minutes to scroll through the entire set of 
results.    When using psycopg2, result rows are buffered fully.   You won't 
see any result until all result rows as well as all data in each column is 
fully fetched into memory.   This causes the appearance of a vast difference in 
speed between a SQL console tool and a Python script.




> 
>> also if any of the individual columns are very large BLOBs or perhaps very 
>> large PG arrays that would add to the overhead of a fetch.
> 
> There definitely are columns of PG arrays ~4000 elements each, so back
> to my first email it seems like the culprit here could be the ARRAY's
> 
> Thanks for your help,
> 
> Adrian
> 
> On Jul 19, 10:10 am, Michael Bayer <mike...@zzzcomputing.com> wrote:
>> On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote:
>> 
>> 
>> 
>> 
>> 
>>> Hi,
>> 
>>> I was just wondering at first whether there was a known issue with
>>> ARRAY types, but here is the query:
>>> spectra =
>>> session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename
>>> == spPlateFilename).all()
>> 
>>> It should return ~1000 objects equaling about 120MB of data. In
>>> Python, this query takes >10 minutes to complete,  but as a SQL query
>>> (copying and pasting the echo'd command) it takes a few seconds:
>>> SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra,
>>> spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values,
>>> spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk,
>>> spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS
>>> spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask,
>>> spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN
>>> spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk
>>> WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'
>> 
>>> autoflush and autocommit are both set to False.
>> 
>>> It seems like a straightforward query so I'm confused as to what could
>>> be getting hung up.
>> 
>> does it take a few seconds to fully fetch all the results and it only gets 
>> 1000 rows ?  or is that just to get the initial result? these are different 
>> things.   also if any of the individual columns are very large BLOBs or 
>> perhaps very large PG arrays that would add to the overhead of a fetch.   
>> You can also try writing a DBAPI-only script that runs the query, as well as 
>> running engine.execute(myquery.statement) and fetching rows that way to see 
>> if some in-object process is the factor (which is unlikely).
>> 
>> 
>> 
>> 
>> 
>>> Thanks for any insight,
>> 
>>> Adrian
>> 
>>> On Jul 16, 10:24 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
>>>> You absolutely need to turn in echoing and locate the specific SQL query 
>>>> which causes the issue.   Queries can take excessive time for a very wide 
>>>> variety of reasons.
>> 
>>>> On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote:
>> 
>>>>> Hello --
>> 
>>>>> I'm working with a database populated with data originally from a file 
>>>>> structure of files that are ~150MB each. We are dealing with a lot of 
>>>>> data that is being stored in the database using the 'ARRAY' datatype, 
>>>>> specifically numeric[]. After loading some of the data into the database 
>>>>> I tried performing a query to get back some data, and comparing it with 
>>>>> code that reads directly from the file system - but the database query 
>>>>> took ~50 times longer to complete. For instance, to retrieve 100 records 
>>>>> that contain a few 4000 element arrays each using the code that reads the 
>>>>> filesystem it took less than a second, but the query on the database took 
>>>>> around 25 seconds to complete. Has anyone else had issues with array 
>>>>> types slowing down queries or does this sound more like another issue?
>> 
>>>>> Thanks!
>>>>> Adrian
>> 
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "sqlalchemy" group.
>>>>> To post to this group, send email to sqlalch...@googlegroups.com.
>>>>> To unsubscribe from this group, send email to 
>>>>> sqlalchemy+unsubscr...@googlegroups.com.
>>>>> For more options, visit this group 
>>>>> athttp://groups.google.com/group/sqlalchemy?hl=en.
>> 
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To post to this group, send email to sqlalch...@googlegroups.com.
>>> To unsubscribe from this group, send email to 
>>> sqlalchemy+unsubscr...@googlegroups.com.
>>> For more options, visit this group 
>>> athttp://groups.google.com/group/sqlalchemy?hl=en.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalch...@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

Reply via email to