[sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

Adrian Price-Whelan Mon, 19 Jul 2010 11:59:46 -0700

Also, as a follow-up - inserting these ARRAYs into the database is
very slow as well, slower than expected that is. Granted, it is
looping over 1000 objects and inserting a few 4000 element arrays for
each object, but doing one big SQL query takes considerably less time
than session.add(object), session.commit().


I thought that might be another clue, thanks!

On Jul 19, 1:53 pm, Adrian Price-Whelan <adrian....@gmail.com> wrote:
> Here is some more detailed information trying the query multiple ways:
>
> Piping the command into psql and writing to a tmp file takes 12 seconds (tmp 
> file is 241MB):
>
> time echo "SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, 
> spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, 
> spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, 
> spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS 
> spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS 
> spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = 
> spectrum.spectrum_header_pk WHERE spectrum_header.filename = 
> 'spPlate-3586-55181.fits';" | psql spectradb -U postgres > tmp
>
> real    0m12.052s
> user    0m2.501s
> sys     0m0.274s
>
> engine.execute on the same query took ~6 seconds:
> spectra = engine.execute("SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS 
> spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS 
> spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, 
> spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS 
> spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS 
> spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = 
> spectrum.spectrum_header_pk WHERE spectrum_header.filename = 
> 'spPlate-3586-55181.fits';")
>
> spectra = 
> session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename 
> == "spPlate-3665-55247.fits").all() clocked in at 489 seconds
>
> Thanks,
> Adrian
>
> On Jul 19, 2010, at 12:24 PM, David Gardner wrote:
>
>
>
> > Try running that query directly against the database see how long that 
> > takes.
> > Also try running explain on that query make sure it is using your indexes 
> > properly.
> > Since you are only using a single filter make sure that the 
> > spectrum_header.filename
> > has an index, and make sure your foreign key column 
> > spectrum.spectrum_header_pk
> > is indexed as well.
>
> > On 07/19/2010 08:46 AM, Adrian Price-Whelan wrote:
> >>> does it take a few seconds to fully fetch all the results and it only 
> >>> gets 1000 rows ?  or is that just to get the initial result?
>
> >> I'm not sure what you mean by this - the query does return 1000 rows.
>
> >>> also if any of the individual columns are very large BLOBs or perhaps 
> >>> very large PG arrays that would add to the overhead of a fetch.
>
> >> There definitely are columns of PG arrays ~4000 elements each, so back
> >> to my first email it seems like the culprit here could be the ARRAY's
>
> >> Thanks for your help,
>
> >> Adrian
>
> >> On Jul 19, 10:10 am, Michael Bayer<mike...@zzzcomputing.com>  wrote:
>
> >>> On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote:
>
> >>>> Hi,
>
> >>>> I was just wondering at first whether there was a known issue with
> >>>> ARRAY types, but here is the query:
> >>>> spectra =
> >>>> session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename
> >>>> == spPlateFilename).all()
>
> >>>> It should return ~1000 objects equaling about 120MB of data. In
> >>>> Python, this query takes>10 minutes to complete,  but as a SQL query
> >>>> (copying and pasting the echo'd command) it takes a few seconds:
> >>>> SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra,
> >>>> spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values,
> >>>> spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk,
> >>>> spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS
> >>>> spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask,
> >>>> spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN
> >>>> spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk
> >>>> WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'
>
> >>>> autoflush and autocommit are both set to False.
>
> >>>> It seems like a straightforward query so I'm confused as to what could
> >>>> be getting hung up.
>
> >>> does it take a few seconds to fully fetch all the results and it only 
> >>> gets 1000 rows ?  or is that just to get the initial result? these are 
> >>> different things.   also if any of the individual columns are very large 
> >>> BLOBs or perhaps very large PG arrays that would add to the overhead of a 
> >>> fetch.   You can also try writing a DBAPI-only script that runs the 
> >>> query, as well as running engine.execute(myquery.statement) and fetching 
> >>> rows that way to see if some in-object process is the factor (which is 
> >>> unlikely).
>
> >>>> Thanks for any insight,
>
> >>>> Adrian
>
> >>>> On Jul 16, 10:24 pm, Michael Bayer<mike...@zzzcomputing.com>  wrote:
>
> >>>>> You absolutely need to turn in echoing and locate the specific SQL 
> >>>>> query which causes the issue.   Queries can take excessive time for a 
> >>>>> very wide variety of reasons.
>
> >>>>> On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote:
>
> >>>>>> Hello --
>
> >>>>>> I'm working with a database populated with data originally from a file 
> >>>>>> structure of files that are ~150MB each. We are dealing with a lot of 
> >>>>>> data that is being stored in the database using the 'ARRAY' datatype, 
> >>>>>> specifically numeric[]. After loading some of the data into the 
> >>>>>> database I tried performing a query to get back some data, and 
> >>>>>> comparing it with code that reads directly from the file system - but 
> >>>>>> the database query took ~50 times longer to complete. For instance, to 
> >>>>>> retrieve 100 records that contain a few 4000 element arrays each using 
> >>>>>> the code that reads the filesystem it took less than a second, but the 
> >>>>>> query on the database took around 25 seconds to complete. Has anyone 
> >>>>>> else had issues with array types slowing down queries or does this 
> >>>>>> sound more like another issue?
>
> >>>>>> Thanks!
> >>>>>> Adrian
>
> >>>>>> --
> >>>>>> You received this message because you are subscribed to the Google 
> >>>>>> Groups "sqlalchemy" group.
> >>>>>> To post to this group, send email to sqlalch...@googlegroups.com.
> >>>>>> To unsubscribe from this group, send email to 
> >>>>>> sqlalchemy+unsubscr...@googlegroups.com.
> >>>>>> For more options, visit this group 
> >>>>>> athttp://groups.google.com/group/sqlalchemy?hl=en.
>
> >>>> --
> >>>> You received this message because you are subscribed to the Google 
> >>>> Groups "sqlalchemy" group.
> >>>> To post to this group, send email to sqlalch...@googlegroups.com.
> >>>> To unsubscribe from this group, send email to 
> >>>> sqlalchemy+unsubscr...@googlegroups.com.
> >>>> For more options, visit this group 
> >>>> athttp://groups.google.com/group/sqlalchemy?hl=en.
>
> > --
> > David Gardner
> > Pipeline Tools Programmer
> > Jim Henson Creature Shop
> > dgard...@creatureshop.com
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "sqlalchemy" group.
> > To post to this group, send email to sqlalch...@googlegroups.com.
> > To unsubscribe from this group, send email to 
> > sqlalchemy+unsubscr...@googlegroups.com.
> > For more options, visit this group 
> > athttp://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

[sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

Reply via email to