Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-02 Thread David Jarvis
Hi, Hmm, that's nice, though I cannot but wonder whether the exclusive lock > required by CLUSTER is going to be a problem in the long run. > Not an issue; the inserts are one-time (or very rare; at most: once a year). > Hm, keep in mind that if the station clause alone is not selective > enou

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-02 Thread David Jarvis
Sorry, Alvaro. I was contemplating using a GIN or GiST index as a way of optimizing the query. Instead, I found that re-inserting the data in order of station ID (the primary look-up column) and then CLUSTER'ing on the station ID, taken date, and category index increased the speed by an order of

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Alvaro Herrera
Excerpts from David Jarvis's message of mar jun 01 14:01:22 -0400 2010: > Sorry, Alvaro. > > I was contemplating using a GIN or GiST index as a way of optimizing the > query. My fault -- I didn't read the whole thread. > Instead, I found that re-inserting the data in order of station ID (the > p

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Alvaro Herrera
Excerpts from Matthew Wakeling's message of mar jun 01 05:55:35 -0400 2010: > On Sun, 23 May 2010, David Jarvis wrote: > > The measurement table indexes (on date and weather station) were not being > > used because the only given date ranges (e.g., 1900 - 2009) were causing the > > planner to do a

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Matthew Wakeling
On Sun, 23 May 2010, David Jarvis wrote: The measurement table indexes (on date and weather station) were not being used because the only given date ranges (e.g., 1900 - 2009) were causing the planner to do a full table scan, which is correct. I wonder if you might see some benefit from CLUSTER

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-23 Thread David Jarvis
Hi, The problem is now solved (in theory). Well, it's not the functions per se that's the problem, it's the lack of > a useful index on the expression. > The measurement table indexes (on date and weather station) were not being used because the only given date ranges (e.g., 1900 - 2009) were ca

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-22 Thread David Jarvis
Hi, certainly understand that you wouldn't want to partition by year. It > Definitely not. > does strike me that perhaps you could partition by day ranges, but you'd > I don't think that will work; users can choose any day range, with the most common as Jan 1 - Dec 31, followed by seasonal ra

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-22 Thread David Jarvis
Hi, CREATE INDEX measurement_01_001_y_idx >> ON climate.measurement_01_001 >> USING btree >> (date_part('year'::text, taken)); >> >> Is that equivalent to what you suggest? >> > > No. It is not the same function, so Postgres has no way to know it produces > the same results (if it does). > Thi

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Stephen Frost
* Yeb Havinga (yebhavi...@gmail.com) wrote: >> Normalizing by date parts was fast. Partitioning the tables by year >> won't do much good -- users will probably choose 1900 to 2009, >> predominately. > Ok, in that case it is a bad idea. Yeah, now that I understand what the user actually wants,

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Matthew Wakeling
Regarding the leap year problem, you might consider creating a modified day of year field, which always assumes that the year contains a leap day. Then a given number always resolves to a given date, regardless of year. If you then partition (or index) on that field, then you may get a benefit.

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Yeb Havinga
David Jarvis wrote: Hi, Yeb. This is starting to go back to the design I used with MySQL: * YEAR_REF - Has year and station * MONTH_REF - Has month, category, and yea referencer * MEASUREMENT - Has month reference, amount, and day Normalizing by date parts was fast. Partitioning th

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Yeb Havinga
Matthew Wakeling wrote: On Fri, 21 May 2010, Yeb Havinga wrote: For time based data I would for sure go for year based indexing. On the contrary, most of the queries seem to be over many years, but rather restricting on the time of year. Therefore, partitioning by month or some other per-yea

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread David Jarvis
Hi, Yeb. This is starting to go back to the design I used with MySQL: - YEAR_REF - Has year and station - MONTH_REF - Has month, category, and yea referencer - MEASUREMENT - Has month reference, amount, and day Normalizing by date parts was fast. Partitioning the tables by year won't do

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Matthew Wakeling
On Fri, 21 May 2010, Yeb Havinga wrote: For time based data I would for sure go for year based indexing. On the contrary, most of the queries seem to be over many years, but rather restricting on the time of year. Therefore, partitioning by month or some other per-year method would seem sensi

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Yeb Havinga
There is a thing that might lead to confusion in the previous post: create or replace function yearmod(int) RETURNS int as 'select (($1 >> 2) %32);' language sql immutable strict; is equivalent with create or replace function yearmod(int) RETURNS int as 'select (($1 / 4) %32);' language sql imm

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Yeb Havinga
David Jarvis wrote: Also, you're trying to do constraint_exclusion, but have you made sure that it's turned on? And have you made sure that those constraints are really the right ones and that they make sense? You're using a bunch of extract()'s there too, why not just

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, (An EXPLAIN ANALYSE would be better here). Look at the expected number of > stations "Nested Loop (cost=0.00..994.94 rows=4046 width=4) (actual time=0.053..41.173 rows=78 loops=1)" " Join Filter: ((6371.009::double precision * sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Matthew Wakeling
On Thu, 20 May 2010, David Jarvis wrote: I took out the date conditions: SELECT m.* FROM climate.measurement m WHERE m.category_id = 1 and m.station_id = 2043 This uses the station indexes: Yes, because there is only one station_id selected. That's exactly what an index is for. Then c

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
I took out the date conditions: SELECT m.* FROM climate.measurement m WHERE m.category_id = 1 and m.station_id = 2043 This uses the station indexes: "Result (cost=0.00..21781.18 rows=8090 width=28)" " -> Append (cost=0.00..21781.18 rows=8090 width=28)" "-> Seq Scan on measur

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, Something in here really smells fishy to me. Those extract's above are > working on values which are from the table.. Why aren't you using these > functions to figure out how to construct the actual dates based on the > values provided by the *user*..? > Because I've only been using Postgre

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, check (taken >= '1913-12-01' and taken <= '1913-12-31') > I don't think I want to constrain by year, for a few reasons: 1. There are a lot of years -- over 110. 2. There will be more years added (both in the future for 2010 and in the past as I get data from other sources). Currently

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, ~300 million measurements ~12000 stations (not 7 as I mentioned before) ~5500 cities some serious data tho, at least. Basically, PG is sequentially scanning > through all of the tables in your partitioning setup. What is > constraint_exclusion set to? What version of PG is this? Do th

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Stephen Frost
* David Jarvis (thanga...@gmail.com) wrote: > I was still referring to the measurement table. You have an index on > > stationid, but still seem to be getting a sequential scan. Maybe the planner > > does not realise that you are selecting a small number of stations. Posting > > an EXPLAIN ANALYSE

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Stephen Frost
* David Jarvis (thanga...@gmail.com) wrote: > There are 72 child tables, each having a year index and a station index, > which are defined as follows: S, my thoughts: Partition by something that makes sense... Typically, I'd say that you would do it by the category id and when the measuremen

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Yeb Havinga
Tom Lane wrote: David Jarvis writes: I was hoping to eliminate this part of the query: (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Stephen Frost
* David Jarvis (thanga...@gmail.com) wrote: > I was hoping to eliminate this part of the query: > (cast(extract( YEAR FROM m.taken ) + greatest( -1 * > sign( > (extract( YEAR FROM m.taken )||'-12-31')::date - > (extract( YEAR FROM m.taken )||'-01-01')::date

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
The greatest() expression reduces to either the current year (year + 0) or the next year (year + 1) by taking the sign of the difference in start/end days. This allows me to derive an end date, such as: Dec 22, 1900 to Mar 22, 1901 Then I check if the measured date falls between those two dates.

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, I was still referring to the measurement table. You have an index on > stationid, but still seem to be getting a sequential scan. Maybe the planner > does not realise that you are selecting a small number of stations. Posting > an EXPLAIN ANALYSE would really help here. > Here is the result f

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
David Jarvis writes: > I was hoping to eliminate this part of the query: > (cast(extract( YEAR FROM m.taken ) + greatest( -1 * > sign( > (extract( YEAR FROM m.taken )||'-12-31')::date - > (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 > ) AS t

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
I was hoping to eliminate this part of the query: (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 ) AS text)||'-12-31')::date That uses

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
David Jarvis writes: > What if I were to have the application pass in two sets of date ranges? > For the condition of Dec 22 to Mar 22: > Dec 22 would become: >- Dec 22 - Dec 31 > Mar 22 would become: >- Jan 1 - Mar 22 I think what you're essentially describing here is removing the OR fro

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
What if I were to have the application pass in two sets of date ranges? For the condition of Dec 22 to Mar 22: Dec 22 would become: - Dec 22 - Dec 31 Mar 22 would become: - Jan 1 - Mar 22 The first range would always be for the current year; the second range would always be for the year

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
David Jarvis writes: > I was thinking that I could add three more columns to the measurement table: > year_taken, month_taken, day_taken > Then index those. That should allow me to avoid extracting years, months, > and days from the *m.taken* date column. You could, but I don't think there's any

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
When using MySQL, the performance was okay (~5 seconds per query) using: date( concat_ws( '-', y.year, m.month, d.day ) ) between -- Start date. date( concat_ws( '-', y.year, $P{Month1}, $P{Day1} ) ) AND -- End date. Calculated by checking to see if the end date wraps -- into the

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Thom Brown
On 20 May 2010 20:02, Tom Lane wrote: > Thom Brown writes: >> On 20 May 2010 17:36, David Jarvis wrote: >> Okay, get your app to convert the month-date to a day of year, so we >> have year_start, year_end, day_of_year_start, day_of_year_end > >> and your where clause would something like this: >

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
Thom Brown writes: > On 20 May 2010 17:36, David Jarvis wrote: > Okay, get your app to convert the month-date to a day of year, so we > have year_start, year_end, day_of_year_start, day_of_year_end > and your where clause would something like this: > WHERE extract(YEAR from m.taken) BETWEEN yea

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Thom Brown
On 20 May 2010 19:36, Thom Brown wrote: > On 20 May 2010 17:36, David Jarvis wrote: >> Hi, Thom. >> >> The query is given two items: >> >> Range of years >> Range of days >> >> I need to select all data between the range of days (e.g., Dec 22 - Mar 22) >> over the range of years (e.g., 1950 - 197

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Thom Brown
On 20 May 2010 17:36, David Jarvis wrote: > Hi, Thom. > > The query is given two items: > > Range of years > Range of days > > I need to select all data between the range of days (e.g., Dec 22 - Mar 22) > over the range of years (e.g., 1950 - 1970), such as shown here: > > http://i.imgur.com/MUkuZ

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, I have posted an image of the user inputs here: http://i.imgur.com/MUkuZ.png The problem is that I am given a range of days (Dec 22 - Mar 22) over a range of years (1900 - 2009) and the range of days can span from one year to the next. This is not the same as saying Dec 22, 1900 to Mar 22, 2

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
Matthew Wakeling writes: > On Wed, 19 May 2010, David Jarvis wrote: >> extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND > That portion of the WHERE clause cannot use an index on m.taken. Postgres > does not look inside functions (like extract) to see if something > indexable is present. T

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Matthew Wakeling
On Wed, 19 May 2010, David Jarvis wrote: extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND That portion of the WHERE clause cannot use an index on m.taken. Postgres does not look inside functions (like extract) to see if something indexable is present. To get an index to work, you c

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Thom Brown
On 20 May 2010 06:06, David Jarvis wrote: > Hi, > > I recently switched to PostgreSQL from MySQL so that I can use PL/R for data > analysis. The query in MySQL form (against a more complex table structure) > takes ~5 seconds to run. The query in PostgreSQL I have yet to let finish, > as it takes o

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Yeb Havinga
Hello David, The table aggregates 237 million rows from its child tables. The sluggishness comes from this part of the query: m.taken BETWEEN /* Start date. */ (extract( YEAR FROM m.taken )||'-01-01')::date AND /* End date. Calculated by checking to see if the end da

[PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-19 Thread David Jarvis
Hi, I recently switched to PostgreSQL from MySQL so that I can use PL/R for data analysis. The query in MySQL form (against a more complex table structure) takes ~5 seconds to run. The query in PostgreSQL I have yet to let finish, as it takes over a minute. I think I have the correct table structu