Hi,
Hmm, that's nice, though I cannot but wonder whether the exclusive lock
> required by CLUSTER is going to be a problem in the long run.
>
Not an issue; the inserts are one-time (or very rare; at most: once a year).
> Hm, keep in mind that if the station clause alone is not selective
> enou
Sorry, Alvaro.
I was contemplating using a GIN or GiST index as a way of optimizing the
query.
Instead, I found that re-inserting the data in order of station ID (the
primary look-up column) and then CLUSTER'ing on the station ID, taken date,
and category index increased the speed by an order of
Excerpts from David Jarvis's message of mar jun 01 14:01:22 -0400 2010:
> Sorry, Alvaro.
>
> I was contemplating using a GIN or GiST index as a way of optimizing the
> query.
My fault -- I didn't read the whole thread.
> Instead, I found that re-inserting the data in order of station ID (the
> p
Excerpts from Matthew Wakeling's message of mar jun 01 05:55:35 -0400 2010:
> On Sun, 23 May 2010, David Jarvis wrote:
> > The measurement table indexes (on date and weather station) were not being
> > used because the only given date ranges (e.g., 1900 - 2009) were causing the
> > planner to do a
On Sun, 23 May 2010, David Jarvis wrote:
The measurement table indexes (on date and weather station) were not being
used because the only given date ranges (e.g., 1900 - 2009) were causing the
planner to do a full table scan, which is correct.
I wonder if you might see some benefit from CLUSTER
Hi,
The problem is now solved (in theory).
Well, it's not the functions per se that's the problem, it's the lack of
> a useful index on the expression.
>
The measurement table indexes (on date and weather station) were not being
used because the only given date ranges (e.g., 1900 - 2009) were ca
Hi,
certainly understand that you wouldn't want to partition by year. It
>
Definitely not.
> does strike me that perhaps you could partition by day ranges, but you'd
>
I don't think that will work; users can choose any day range, with the most
common as Jan 1 - Dec 31, followed by seasonal ra
Hi,
CREATE INDEX measurement_01_001_y_idx
>> ON climate.measurement_01_001
>> USING btree
>> (date_part('year'::text, taken));
>>
>> Is that equivalent to what you suggest?
>>
>
> No. It is not the same function, so Postgres has no way to know it produces
> the same results (if it does).
>
Thi
* Yeb Havinga (yebhavi...@gmail.com) wrote:
>> Normalizing by date parts was fast. Partitioning the tables by year
>> won't do much good -- users will probably choose 1900 to 2009,
>> predominately.
> Ok, in that case it is a bad idea.
Yeah, now that I understand what the user actually wants,
Regarding the leap year problem, you might consider creating a modified day
of year field, which always assumes that the year contains a leap day. Then
a given number always resolves to a given date, regardless of year. If you
then partition (or index) on that field, then you may get a benefit.
David Jarvis wrote:
Hi, Yeb.
This is starting to go back to the design I used with MySQL:
* YEAR_REF - Has year and station
* MONTH_REF - Has month, category, and yea referencer
* MEASUREMENT - Has month reference, amount, and day
Normalizing by date parts was fast. Partitioning th
Matthew Wakeling wrote:
On Fri, 21 May 2010, Yeb Havinga wrote:
For time based data I would for sure go for year based indexing.
On the contrary, most of the queries seem to be over many years, but
rather restricting on the time of year. Therefore, partitioning by
month or some other per-yea
Hi, Yeb.
This is starting to go back to the design I used with MySQL:
- YEAR_REF - Has year and station
- MONTH_REF - Has month, category, and yea referencer
- MEASUREMENT - Has month reference, amount, and day
Normalizing by date parts was fast. Partitioning the tables by year won't do
On Fri, 21 May 2010, Yeb Havinga wrote:
For time based data I would for sure go for year based indexing.
On the contrary, most of the queries seem to be over many years, but
rather restricting on the time of year. Therefore, partitioning by month
or some other per-year method would seem sensi
There is a thing that might lead to confusion in the previous post:
create or replace function yearmod(int) RETURNS int
as 'select (($1 >> 2) %32);'
language sql
immutable
strict;
is equivalent with
create or replace function yearmod(int) RETURNS int
as 'select (($1 / 4) %32);'
language sql
imm
David Jarvis wrote:
Also, you're trying to do constraint_exclusion, but have you made sure
that it's turned on? And have you made sure that those
constraints are
really the right ones and that they make sense? You're using a
bunch of
extract()'s there too, why not just
Hi,
(An EXPLAIN ANALYSE would be better here). Look at the expected number of
> stations
"Nested Loop (cost=0.00..994.94 rows=4046 width=4) (actual
time=0.053..41.173 rows=78 loops=1)"
" Join Filter: ((6371.009::double precision *
sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::
On Thu, 20 May 2010, David Jarvis wrote:
I took out the date conditions:
SELECT
m.*
FROM
climate.measurement m
WHERE
m.category_id = 1 and
m.station_id = 2043
This uses the station indexes:
Yes, because there is only one station_id selected. That's exactly what an
index is for.
Then c
I took out the date conditions:
SELECT
m.*
FROM
climate.measurement m
WHERE
m.category_id = 1 and
m.station_id = 2043
This uses the station indexes:
"Result (cost=0.00..21781.18 rows=8090 width=28)"
" -> Append (cost=0.00..21781.18 rows=8090 width=28)"
"-> Seq Scan on measur
Hi,
Something in here really smells fishy to me. Those extract's above are
> working on values which are from the table.. Why aren't you using these
> functions to figure out how to construct the actual dates based on the
> values provided by the *user*..?
>
Because I've only been using Postgre
Hi,
check (taken >= '1913-12-01' and taken <= '1913-12-31')
>
I don't think I want to constrain by year, for a few reasons:
1. There are a lot of years -- over 110.
2. There will be more years added (both in the future for 2010 and in the
past as I get data from other sources).
Currently
Hi,
~300 million measurements
~12000 stations (not 7 as I mentioned before)
~5500 cities
some serious data tho, at least. Basically, PG is sequentially scanning
> through all of the tables in your partitioning setup. What is
> constraint_exclusion set to? What version of PG is this? Do th
* David Jarvis (thanga...@gmail.com) wrote:
> I was still referring to the measurement table. You have an index on
> > stationid, but still seem to be getting a sequential scan. Maybe the planner
> > does not realise that you are selecting a small number of stations. Posting
> > an EXPLAIN ANALYSE
* David Jarvis (thanga...@gmail.com) wrote:
> There are 72 child tables, each having a year index and a station index,
> which are defined as follows:
S, my thoughts:
Partition by something that makes sense... Typically, I'd say that you
would do it by the category id and when the measuremen
Tom Lane wrote:
David Jarvis writes:
I was hoping to eliminate this part of the query:
(cast(extract( YEAR FROM m.taken ) + greatest( -1 *
sign(
(extract( YEAR FROM m.taken )||'-12-31')::date -
(extract( YEAR FROM m.taken )||'-01-01')::date ), 0
* David Jarvis (thanga...@gmail.com) wrote:
> I was hoping to eliminate this part of the query:
> (cast(extract( YEAR FROM m.taken ) + greatest( -1 *
> sign(
> (extract( YEAR FROM m.taken )||'-12-31')::date -
> (extract( YEAR FROM m.taken )||'-01-01')::date
The greatest() expression reduces to either the current year (year + 0) or
the next year (year + 1) by taking the sign of the difference in start/end
days. This allows me to derive an end date, such as:
Dec 22, 1900 to Mar 22, 1901
Then I check if the measured date falls between those two dates.
Hi,
I was still referring to the measurement table. You have an index on
> stationid, but still seem to be getting a sequential scan. Maybe the planner
> does not realise that you are selecting a small number of stations. Posting
> an EXPLAIN ANALYSE would really help here.
>
Here is the result f
David Jarvis writes:
> I was hoping to eliminate this part of the query:
> (cast(extract( YEAR FROM m.taken ) + greatest( -1 *
> sign(
> (extract( YEAR FROM m.taken )||'-12-31')::date -
> (extract( YEAR FROM m.taken )||'-01-01')::date ), 0
> ) AS t
I was hoping to eliminate this part of the query:
(cast(extract( YEAR FROM m.taken ) + greatest( -1 *
sign(
(extract( YEAR FROM m.taken )||'-12-31')::date -
(extract( YEAR FROM m.taken )||'-01-01')::date ), 0
) AS text)||'-12-31')::date
That uses
David Jarvis writes:
> What if I were to have the application pass in two sets of date ranges?
> For the condition of Dec 22 to Mar 22:
> Dec 22 would become:
>- Dec 22 - Dec 31
> Mar 22 would become:
>- Jan 1 - Mar 22
I think what you're essentially describing here is removing the OR fro
What if I were to have the application pass in two sets of date ranges?
For the condition of Dec 22 to Mar 22:
Dec 22 would become:
- Dec 22 - Dec 31
Mar 22 would become:
- Jan 1 - Mar 22
The first range would always be for the current year; the second range would
always be for the year
David Jarvis writes:
> I was thinking that I could add three more columns to the measurement table:
> year_taken, month_taken, day_taken
> Then index those. That should allow me to avoid extracting years, months,
> and days from the *m.taken* date column.
You could, but I don't think there's any
When using MySQL, the performance was okay (~5 seconds per query) using:
date( concat_ws( '-', y.year, m.month, d.day ) ) between
-- Start date.
date( concat_ws( '-', y.year, $P{Month1}, $P{Day1} ) ) AND
-- End date. Calculated by checking to see if the end date wraps
-- into the
On 20 May 2010 20:02, Tom Lane wrote:
> Thom Brown writes:
>> On 20 May 2010 17:36, David Jarvis wrote:
>> Okay, get your app to convert the month-date to a day of year, so we
>> have year_start, year_end, day_of_year_start, day_of_year_end
>
>> and your where clause would something like this:
>
Thom Brown writes:
> On 20 May 2010 17:36, David Jarvis wrote:
> Okay, get your app to convert the month-date to a day of year, so we
> have year_start, year_end, day_of_year_start, day_of_year_end
> and your where clause would something like this:
> WHERE extract(YEAR from m.taken) BETWEEN yea
On 20 May 2010 19:36, Thom Brown wrote:
> On 20 May 2010 17:36, David Jarvis wrote:
>> Hi, Thom.
>>
>> The query is given two items:
>>
>> Range of years
>> Range of days
>>
>> I need to select all data between the range of days (e.g., Dec 22 - Mar 22)
>> over the range of years (e.g., 1950 - 197
On 20 May 2010 17:36, David Jarvis wrote:
> Hi, Thom.
>
> The query is given two items:
>
> Range of years
> Range of days
>
> I need to select all data between the range of days (e.g., Dec 22 - Mar 22)
> over the range of years (e.g., 1950 - 1970), such as shown here:
>
> http://i.imgur.com/MUkuZ
Hi,
I have posted an image of the user inputs here:
http://i.imgur.com/MUkuZ.png
The problem is that I am given a range of days (Dec 22 - Mar 22) over a
range of years (1900 - 2009) and the range of days can span from one year to
the next. This is not the same as saying Dec 22, 1900 to Mar 22, 2
Matthew Wakeling writes:
> On Wed, 19 May 2010, David Jarvis wrote:
>> extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND
> That portion of the WHERE clause cannot use an index on m.taken. Postgres
> does not look inside functions (like extract) to see if something
> indexable is present. T
On Wed, 19 May 2010, David Jarvis wrote:
extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND
That portion of the WHERE clause cannot use an index on m.taken. Postgres
does not look inside functions (like extract) to see if something
indexable is present. To get an index to work, you c
On 20 May 2010 06:06, David Jarvis wrote:
> Hi,
>
> I recently switched to PostgreSQL from MySQL so that I can use PL/R for data
> analysis. The query in MySQL form (against a more complex table structure)
> takes ~5 seconds to run. The query in PostgreSQL I have yet to let finish,
> as it takes o
Hello David,
The table aggregates 237 million rows from its child tables. The
sluggishness comes from this part of the query:
m.taken BETWEEN
/* Start date. */
(extract( YEAR FROM m.taken )||'-01-01')::date AND
/* End date. Calculated by checking to see if the end da
Hi,
I recently switched to PostgreSQL from MySQL so that I can use PL/R for data
analysis. The query in MySQL form (against a more complex table structure)
takes ~5 seconds to run. The query in PostgreSQL I have yet to let finish,
as it takes over a minute. I think I have the correct table structu
44 matches
Mail list logo