Also, as a follow-up - inserting these ARRAYs into the database is very slow as well, slower than expected that is. Granted, it is looping over 1000 objects and inserting a few 4000 element arrays for each object, but doing one big SQL query takes considerably less time than session.add(object), session.commit().
I thought that might be another clue, thanks! On Jul 19, 1:53 pm, Adrian Price-Whelan <adrian....@gmail.com> wrote: > Here is some more detailed information trying the query multiple ways: > > Piping the command into psql and writing to a tmp file takes 12 seconds (tmp > file is 241MB): > > time echo "SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, > spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, > spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, > spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS > spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS > spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = > spectrum.spectrum_header_pk WHERE spectrum_header.filename = > 'spPlate-3586-55181.fits';" | psql spectradb -U postgres > tmp > > real 0m12.052s > user 0m2.501s > sys 0m0.274s > > engine.execute on the same query took ~6 seconds: > spectra = engine.execute("SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS > spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS > spectrum_values, spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, > spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS > spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS > spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = > spectrum.spectrum_header_pk WHERE spectrum_header.filename = > 'spPlate-3586-55181.fits';") > > spectra = > session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename > == "spPlate-3665-55247.fits").all() clocked in at 489 seconds > > Thanks, > Adrian > > On Jul 19, 2010, at 12:24 PM, David Gardner wrote: > > > > > Try running that query directly against the database see how long that > > takes. > > Also try running explain on that query make sure it is using your indexes > > properly. > > Since you are only using a single filter make sure that the > > spectrum_header.filename > > has an index, and make sure your foreign key column > > spectrum.spectrum_header_pk > > is indexed as well. > > > On 07/19/2010 08:46 AM, Adrian Price-Whelan wrote: > >>> does it take a few seconds to fully fetch all the results and it only > >>> gets 1000 rows ? or is that just to get the initial result? > > >> I'm not sure what you mean by this - the query does return 1000 rows. > > >>> also if any of the individual columns are very large BLOBs or perhaps > >>> very large PG arrays that would add to the overhead of a fetch. > > >> There definitely are columns of PG arrays ~4000 elements each, so back > >> to my first email it seems like the culprit here could be the ARRAY's > > >> Thanks for your help, > > >> Adrian > > >> On Jul 19, 10:10 am, Michael Bayer<mike...@zzzcomputing.com> wrote: > > >>> On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote: > > >>>> Hi, > > >>>> I was just wondering at first whether there was a known issue with > >>>> ARRAY types, but here is the query: > >>>> spectra = > >>>> session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename > >>>> == spPlateFilename).all() > > >>>> It should return ~1000 objects equaling about 120MB of data. In > >>>> Python, this query takes>10 minutes to complete, but as a SQL query > >>>> (copying and pasting the echo'd command) it takes a few seconds: > >>>> SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, > >>>> spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, > >>>> spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, > >>>> spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS > >>>> spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, > >>>> spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN > >>>> spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk > >>>> WHERE spectrum_header.filename = 'spPlate-3586-55181.fits' > > >>>> autoflush and autocommit are both set to False. > > >>>> It seems like a straightforward query so I'm confused as to what could > >>>> be getting hung up. > > >>> does it take a few seconds to fully fetch all the results and it only > >>> gets 1000 rows ? or is that just to get the initial result? these are > >>> different things. also if any of the individual columns are very large > >>> BLOBs or perhaps very large PG arrays that would add to the overhead of a > >>> fetch. You can also try writing a DBAPI-only script that runs the > >>> query, as well as running engine.execute(myquery.statement) and fetching > >>> rows that way to see if some in-object process is the factor (which is > >>> unlikely). > > >>>> Thanks for any insight, > > >>>> Adrian > > >>>> On Jul 16, 10:24 pm, Michael Bayer<mike...@zzzcomputing.com> wrote: > > >>>>> You absolutely need to turn in echoing and locate the specific SQL > >>>>> query which causes the issue. Queries can take excessive time for a > >>>>> very wide variety of reasons. > > >>>>> On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote: > > >>>>>> Hello -- > > >>>>>> I'm working with a database populated with data originally from a file > >>>>>> structure of files that are ~150MB each. We are dealing with a lot of > >>>>>> data that is being stored in the database using the 'ARRAY' datatype, > >>>>>> specifically numeric[]. After loading some of the data into the > >>>>>> database I tried performing a query to get back some data, and > >>>>>> comparing it with code that reads directly from the file system - but > >>>>>> the database query took ~50 times longer to complete. For instance, to > >>>>>> retrieve 100 records that contain a few 4000 element arrays each using > >>>>>> the code that reads the filesystem it took less than a second, but the > >>>>>> query on the database took around 25 seconds to complete. Has anyone > >>>>>> else had issues with array types slowing down queries or does this > >>>>>> sound more like another issue? > > >>>>>> Thanks! > >>>>>> Adrian > > >>>>>> -- > >>>>>> You received this message because you are subscribed to the Google > >>>>>> Groups "sqlalchemy" group. > >>>>>> To post to this group, send email to sqlalch...@googlegroups.com. > >>>>>> To unsubscribe from this group, send email to > >>>>>> sqlalchemy+unsubscr...@googlegroups.com. > >>>>>> For more options, visit this group > >>>>>> athttp://groups.google.com/group/sqlalchemy?hl=en. > > >>>> -- > >>>> You received this message because you are subscribed to the Google > >>>> Groups "sqlalchemy" group. > >>>> To post to this group, send email to sqlalch...@googlegroups.com. > >>>> To unsubscribe from this group, send email to > >>>> sqlalchemy+unsubscr...@googlegroups.com. > >>>> For more options, visit this group > >>>> athttp://groups.google.com/group/sqlalchemy?hl=en. > > > -- > > David Gardner > > Pipeline Tools Programmer > > Jim Henson Creature Shop > > dgard...@creatureshop.com > > > -- > > You received this message because you are subscribed to the Google Groups > > "sqlalchemy" group. > > To post to this group, send email to sqlalch...@googlegroups.com. > > To unsubscribe from this group, send email to > > sqlalchemy+unsubscr...@googlegroups.com. > > For more options, visit this group > > athttp://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.