Re: [HACKERS] Better estimates of index correlation

2011-03-15 Thread Jeff Davis
On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote: It strikes me that it'd be possible to have btvacuumcleanup directly measure order correlation when it's processing a btree index, yielding a reliable answer for any btree index regardless of number of columns. We could do that by comparing

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Alvaro Herrera
Excerpts from Joshua D. Drake's message of dom mar 13 23:20:01 -0300 2011: On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote: I'm not planning to do anything about this idea right now, since I'm still hip-deep in collations, but I thought I'd throw it out to get it on the record.

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Heikki Linnakangas
On 14.03.2011 16:09, Alvaro Herrera wrote: Excerpts from Joshua D. Drake's message of dom mar 13 23:20:01 -0300 2011: On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote: I'm not planning to do anything about this idea right now, since I'm still hip-deep in collations, but I thought I'd throw

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Robert Haas
On Mon, Mar 14, 2011 at 10:09 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Joshua D. Drake's message of dom mar 13 23:20:01 -0300 2011: On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote: I'm not planning to do anything about this idea right now, since I'm still

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Alvaro Herrera
Excerpts from Robert Haas's message of lun mar 14 11:18:24 -0300 2011: On Mon, Mar 14, 2011 at 10:09 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: It sure would be nice to be able to do it only during the last scan. Does it really matter? What Tom was describing sounded

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Robert Haas
On Mon, Mar 14, 2011 at 10:25 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of lun mar 14 11:18:24 -0300 2011: On Mon, Mar 14, 2011 at 10:09 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: It sure would be nice to be able to do it only during

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Robert Haas's message of lun mar 14 11:18:24 -0300 2011: Does it really matter? What Tom was describing sounded embarassingly cheap. That was my thought exactly. If you could even measure the added cost of doing that, I'd be

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Robert Haas
On Mon, Mar 14, 2011 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Note that we could remove the correlation calculations from ANALYZE altogether. Only if you don't mind having them only get updated when somebody vacuums. If a table is mostly getting inserted into, it may not get vacuumed

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Mon, Mar 14, 2011 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Note that we could remove the correlation calculations from ANALYZE altogether. Only if you don't mind having them only get updated when somebody vacuums. If a table is mostly

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Josh Berkus
As Heikki says, maybe this wouldn't be an issue at all if we can do it during ANALYZE instead, but I don't know if that works. I'm not convinced you can get a sufficiently good estimate from a small subset of pages. Note that if this requires VACUUM rather than ANALYZE, it introduces a

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: I'm not convinced you can get a sufficiently good estimate from a small subset of pages. Note that if this requires VACUUM rather than ANALYZE, it introduces a problem for data warehousing users, who can go years between vacuums of their largest tables.

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Josh Berkus
On 3/14/11 5:51 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: I'm not convinced you can get a sufficiently good estimate from a small subset of pages. Note that if this requires VACUUM rather than ANALYZE, it introduces a problem for data warehousing users, who can go years

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Greg Stark
On Tue, Mar 15, 2011 at 12:27 AM, Josh Berkus j...@agliodbs.com wrote: Note that if this requires VACUUM rather than ANALYZE, it introduces a problem for data warehousing users, who can go years between vacuums of their largest tables. I don't understand, are they going years between vacuums

Re: [HACKERS] Better estimates of index correlation

2011-03-14 Thread Josh Berkus
I don't understand, are they going years between vacuums because their data is static? In which case the index correlation won't change. Or is it append-only, in which case I suspect the newly appended data is likely to have the same correlation as the old data. Append-only. And yes, one

[HACKERS] Better estimates of index correlation

2011-03-13 Thread Tom Lane
Currently, we don't measure any statistics about the ordering correlation of multi-column indexes, which means that btcostestimate has to pick a number out of the air if there's more than one column. We've been around on that at least once already: it used to use first column's correlation divided

Re: [HACKERS] Better estimates of index correlation

2011-03-13 Thread Joshua D. Drake
On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote: I'm not planning to do anything about this idea right now, since I'm still hip-deep in collations, but I thought I'd throw it out to get it on the record. Comments? One question: Where is the overhead increase? JD