Here is some more detailed information trying the query multiple ways: Piping the command into psql and writing to a tmp file takes 12 seconds (tmp file is 241MB):
time echo "SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits';" | psql spectradb -U postgres > tmp real 0m12.052s user 0m2.501s sys 0m0.274s engine.execute on the same query took ~6 seconds: spectra = engine.execute("SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk WHERE spectrum_header.filename = 'spPlate-3586-55181.fits';") spectra = session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == "spPlate-3665-55247.fits").all() clocked in at 489 seconds Thanks, Adrian On Jul 19, 2010, at 12:24 PM, David Gardner wrote: > Try running that query directly against the database see how long that takes. > Also try running explain on that query make sure it is using your indexes > properly. > Since you are only using a single filter make sure that the > spectrum_header.filename > has an index, and make sure your foreign key column > spectrum.spectrum_header_pk > is indexed as well. > > On 07/19/2010 08:46 AM, Adrian Price-Whelan wrote: >>> does it take a few seconds to fully fetch all the results and it only gets >>> 1000 rows ? or is that just to get the initial result? >>> >> I'm not sure what you mean by this - the query does return 1000 rows. >> >> >>> also if any of the individual columns are very large BLOBs or perhaps very >>> large PG arrays that would add to the overhead of a fetch. >>> >> There definitely are columns of PG arrays ~4000 elements each, so back >> to my first email it seems like the culprit here could be the ARRAY's >> >> Thanks for your help, >> >> Adrian >> >> On Jul 19, 10:10 am, Michael Bayer<mike...@zzzcomputing.com> wrote: >> >>> On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote: >>> >>> >>> >>> >>> >>> >>>> Hi, >>>> >>> >>>> I was just wondering at first whether there was a known issue with >>>> ARRAY types, but here is the query: >>>> spectra = >>>> session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename >>>> == spPlateFilename).all() >>>> >>> >>>> It should return ~1000 objects equaling about 120MB of data. In >>>> Python, this query takes>10 minutes to complete, but as a SQL query >>>> (copying and pasting the echo'd command) it takes a few seconds: >>>> SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, >>>> spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, >>>> spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, >>>> spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS >>>> spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, >>>> spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN >>>> spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk >>>> WHERE spectrum_header.filename = 'spPlate-3586-55181.fits' >>>> >>> >>>> autoflush and autocommit are both set to False. >>>> >>> >>>> It seems like a straightforward query so I'm confused as to what could >>>> be getting hung up. >>>> >>> does it take a few seconds to fully fetch all the results and it only gets >>> 1000 rows ? or is that just to get the initial result? these are different >>> things. also if any of the individual columns are very large BLOBs or >>> perhaps very large PG arrays that would add to the overhead of a fetch. >>> You can also try writing a DBAPI-only script that runs the query, as well >>> as running engine.execute(myquery.statement) and fetching rows that way to >>> see if some in-object process is the factor (which is unlikely). >>> >>> >>> >>> >>> >>> >>>> Thanks for any insight, >>>> >>> >>>> Adrian >>>> >>> >>>> On Jul 16, 10:24 pm, Michael Bayer<mike...@zzzcomputing.com> wrote: >>>> >>>>> You absolutely need to turn in echoing and locate the specific SQL query >>>>> which causes the issue. Queries can take excessive time for a very wide >>>>> variety of reasons. >>>>> >>> >>>>> On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote: >>>>> >>> >>>>>> Hello -- >>>>>> >>> >>>>>> I'm working with a database populated with data originally from a file >>>>>> structure of files that are ~150MB each. We are dealing with a lot of >>>>>> data that is being stored in the database using the 'ARRAY' datatype, >>>>>> specifically numeric[]. After loading some of the data into the database >>>>>> I tried performing a query to get back some data, and comparing it with >>>>>> code that reads directly from the file system - but the database query >>>>>> took ~50 times longer to complete. For instance, to retrieve 100 records >>>>>> that contain a few 4000 element arrays each using the code that reads >>>>>> the filesystem it took less than a second, but the query on the database >>>>>> took around 25 seconds to complete. Has anyone else had issues with >>>>>> array types slowing down queries or does this sound more like another >>>>>> issue? >>>>>> >>> >>>>>> Thanks! >>>>>> Adrian >>>>>> >>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "sqlalchemy" group. >>>>>> To post to this group, send email to sqlalch...@googlegroups.com. >>>>>> To unsubscribe from this group, send email to >>>>>> sqlalchemy+unsubscr...@googlegroups.com. >>>>>> For more options, visit this group >>>>>> athttp://groups.google.com/group/sqlalchemy?hl=en. >>>>>> >>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "sqlalchemy" group. >>>> To post to this group, send email to sqlalch...@googlegroups.com. >>>> To unsubscribe from this group, send email to >>>> sqlalchemy+unsubscr...@googlegroups.com. >>>> For more options, visit this group >>>> athttp://groups.google.com/group/sqlalchemy?hl=en. >>>> >> > > > -- > David Gardner > Pipeline Tools Programmer > Jim Henson Creature Shop > dgard...@creatureshop.com > > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To post to this group, send email to sqlalch...@googlegroups.com. > To unsubscribe from this group, send email to > sqlalchemy+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/sqlalchemy?hl=en. > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.