* Qi Huang (huangq...@hotmail.com) wrote:
> Thanks, guys, for your hot discussion. I'll explore the ANALYZE command first
> and try make SYSTEM sampling work. Other parts, I'll look at them later.
That sounds like the right first steps to me. Reviewing this
discussion, it was my thought to crea
l.com;
> neil.con...@gmail.com; dan...@heroku.com; huangq...@hotmail.com;
> f...@phlo.org; pgsql-hackers@postgresql.org; sfr...@snowman.net
> Subject: Re: [HACKERS] Gsoc2012 idea, tablesample
>
> On Thu, May 10, 2012 at 02:30:35PM -0400, Robert Haas wrote:
> > On Thu, May 10, 201
On Fri, May 11, 2012 at 6:16 PM, Kevin Grittner
wrote:
>> MaxHeapTuplesPerPage?
>
> What about dead line pointers without corresponding tuples?
Actually we don't allow there to be more than MaxHeapTuplesPerPage
line pointers even if some of them are dead line pointers.
I think the argument then
Robert Haas wrote:
> On Fri, May 11, 2012 at 11:36 AM, Kevin Grittner
> wrote:
>> Robert Haas wrote:
>>> The trouble is, AFAICS, that you can't bound M very well without
>>> scanning the whole table. I mean, it's bounded by theoretical
>>> limit, but that's it.
>>
>> What would the theoretical
On Fri, May 11, 2012 at 11:36 AM, Kevin Grittner
wrote:
> Robert Haas wrote:
>> The trouble is, AFAICS, that you can't bound M very well without
>> scanning the whole table. I mean, it's bounded by theoretical
>> limit, but that's it.
>
> What would the theoretical limit be?
MaxHeapTuplesPerPag
On May11, 2012, at 16:03 , Kevin Grittner wrote:
>> [more complex alternatives]
>
> I really think your first suggestion covers it perfectly; these more
> complex techniques don't seem necessary to me.
The point of the more complex techniques (especially the algorithm in
my second mail, the "repl
Tom Lane wrote:
> "Kevin Grittner" writes:
>> [ uniformly sample the TID space defined as (1..P, 1..M) ]
>
>> Shouldn't that get us the randomly chosen sample we're looking
>> for? Is there a problem you think this ignores?
>
> Not sure. The issue that I'm wondering about is that the line
> n
On May11, 2012, at 17:20 , Robert Haas wrote:
> On Fri, May 11, 2012 at 11:04 AM, Kevin Grittner
>> Since T cancels out of that equation, we don't need to worry about
>> estimating it. Results will be correct for any value of M which is
>> *at least* as large as the maximum tuple index in the tabl
"Kevin Grittner" writes:
> Robert Haas wrote:
>> The trouble is, AFAICS, that you can't bound M very well without
>> scanning the whole table. I mean, it's bounded by theoretical
>> limit, but that's it.
> What would the theoretical limit be? (black size - page header size
> - minimum size of
Robert Haas wrote:
> The trouble is, AFAICS, that you can't bound M very well without
> scanning the whole table. I mean, it's bounded by theoretical
> limit, but that's it.
What would the theoretical limit be? (black size - page header size
- minimum size of one tuple) / item pointer size?
"Kevin Grittner" writes:
> [ uniformly sample the TID space defined as (1..P, 1..M) ]
> Shouldn't that get us the randomly chosen sample we're looking for?
> Is there a problem you think this ignores?
Not sure. The issue that I'm wondering about is that the line number
part of the space is not
On Fri, May 11, 2012 at 11:04 AM, Kevin Grittner
wrote:
> We
> will be probing random page numbers 1..P for random tuple indexes
> 1..M. So how many random probes by ctid does that require? The
> chance of a hit on each probe is ((T/P)/M) -- the average number of
> tuples per page divided by the
Tom Lane wrote:
> If you're willing to accept that the quality of the results
> depends on having up-to-date stats, then I'd suggest (1) use the
> planner's existing technology to estimate the number of rows in
> the table; (2) multiply by sampling factor you want to get a
> desired number of sa
"Kevin Grittner" writes:
> Florian Pflug wrote:
>> Maybe one can get rid of these sorts of problems by factoring in
>> the expected density of the table beforehand and simply accepting
>> that the results will be inaccurate if the statistics are
>> outdated?
> Unless I'm missing something, I th
Florian Pflug writes:
> This all hinges on the ability to produce a sufficient accurate estimate of
> the
> TID density p_tup/p_tid, of course.
I think that's the least of its problems. AFAICS this analysis ignores
(1) the problem that the TID space is nonuniform, ie we don't know how
many tupl
Florian Pflug wrote:
> Maybe one can get rid of these sorts of problems by factoring in
> the expected density of the table beforehand and simply accepting
> that the results will be inaccurate if the statistics are
> outdated?
>
> One could, for example, simply pick
>
> N := SamplingPercent
On Thu, May 10, 2012 at 02:30:35PM -0400, Robert Haas wrote:
> On Thu, May 10, 2012 at 2:07 PM, Kevin Grittner
> wrote:
> > Ants Aasma wrote:
> >> It seems to me that the simplest thing to do would be to lift the
> >> sampling done in analyze.c (acquire_sample_rows) and use that to
> >> implement
[ Sorry for the self-reply ]
On May11, 2012, at 13:17 , Florian Pflug wrote:
> Actually, thinking more about this, if the approach sketched above
> turns out to work, then one might be able to get away without remembering
> previously computed TIDs, thus removing a lot of complexity.
>
> Say, for
> Florian Pflug wrote:
>> On May10, 2012, at 18:36 , Kevin Grittner wrote:
>>> Robert Haas wrote:
>>>
I wonder if you could do this with something akin to the Bitmap
Heap Scan machinery. Populate a TID bitmap with a bunch of
randomly chosen TIDs, fetch them all in physical order
>
[point splintered in quoting re-joined]
Florian Pflug wrote:
> On May10, 2012, at 18:36 , Kevin Grittner wrote:
>> Robert Haas wrote:
>>
>>> I wonder if you could do this with something akin to the Bitmap
>>> Heap Scan machinery. Populate a TID bitmap with a bunch of
>>> randomly chosen TIDs, f
On May10, 2012, at 18:36 , Kevin Grittner wrote:
> Robert Haas wrote:
>
>> I wonder if you could do this with something akin to the Bitmap
>> Heap Scan machinery. Populate a TID bitmap with a bunch of
>> randomly chosen TIDs, fetch them all in physical order
>> and if you don't get as many rows
On Thu, May 10, 2012 at 2:07 PM, Kevin Grittner
wrote:
> Ants Aasma wrote:
>> It seems to me that the simplest thing to do would be to lift the
>> sampling done in analyze.c (acquire_sample_rows) and use that to
>> implement the SYSTEM sampling method.
>
> Definitely. I thought we had all agreed
Ants Aasma wrote:
> It seems to me that the simplest thing to do would be to lift the
> sampling done in analyze.c (acquire_sample_rows) and use that to
> implement the SYSTEM sampling method.
Definitely. I thought we had all agreed on that ages ago.
-Kevin
--
Sent via pgsql-hackers mail
On Thu, May 10, 2012 at 6:33 PM, Robert Haas wrote:
> I'm worried this project is getting so complicated that it will be
> beyond the ability of a new hacker to get anything useful done. Can
> we simplify the requirements here to something that is reasonable for
> a beginner?
It seems to me that
Robert Haas wrote:
> I wonder if you could do this with something akin to the Bitmap
> Heap Scan machinery. Populate a TID bitmap with a bunch of
> randomly chosen TIDs, fetch them all in physical order
It would be pretty hard for any other plan to beat that by very
much, so it seems like a g
On Thu, May 10, 2012 at 10:28 AM, Kevin Grittner
wrote:
>> One problem I see with this approach is that its efficiency
>> depends on the average tuple length, at least with a naive
>> approach to random ctid generator. The simplest way to generate
>> those randomly without introducing bias is to g
Florian Pflug wrote:
> One problem I see with this approach is that its efficiency
> depends on the average tuple length, at least with a naive
> approach to random ctid generator. The simplest way to generate
> those randomly without introducing bias is to generate a random
> page index between
On May10, 2012, at 10:43 , Qi Huang wrote:
> 2. use TIDSCAN to directly access tuples. The below way of using ctid
> proposed by Kevin looks good.
>
> -One technique which might be suitably random without reading the
> -whole table would be to figure out a maximum block number and tuple
> -ID fo
Hi, AllThanks for your ideas on the implementation of TABLESAMPLE. I have a
summary below of the high level requirements from the -hacker thread till now.
Please give further comment and if I missed any point, please fell free to add.
1. Build a new type of node, as I should not use SEQSCAN
On Tue, Apr 24, 2012 at 10:31 AM, Sandro Santilli wrote:
> On Tue, Apr 24, 2012 at 08:49:26AM +0200, Sandro Santilli wrote:
>> On Mon, Apr 23, 2012 at 08:34:44PM +0300, Ants Aasma wrote:
>
>> > SELECT (SELECT reservoir_sample(some_table, 50) AS samples
>> > FROM some_table WHERE ctid =~ ANY (rn
On Tue, Apr 24, 2012 at 08:49:26AM +0200, Sandro Santilli wrote:
> On Mon, Apr 23, 2012 at 08:34:44PM +0300, Ants Aasma wrote:
> > SELECT (SELECT reservoir_sample(some_table, 50) AS samples
> >FROM some_table WHERE ctid =~ ANY (rnd_pgtids))
> > FROM random_pages('some_table', 50) AS rnd_pgtids
On Mon, Apr 23, 2012 at 08:34:44PM +0300, Ants Aasma wrote:
> On Mon, Apr 23, 2012 at 4:37 PM, Sandro Santilli wrote:
> > I'd love to see enhanced CTID operators, to fetch all visible tuples in a
> > page
> > using a tidscan. Something like: WHERE ctid =~ '(501,*)' or a ctidrange.
>
> Among oth
On Mon, Apr 23, 2012 at 4:37 PM, Sandro Santilli wrote:
> I'd love to see enhanced CTID operators, to fetch all visible tuples in a page
> using a tidscan. Something like: WHERE ctid =~ '(501,*)' or a ctidrange.
Among other things, this would enable user-space implementation of
tablesample. Give
On Sat, Apr 21, 2012 at 02:28:52PM +0800, Qi Huang wrote:
>
> Hi, Heikki
...
> > Another idea that Robert Haas suggested was to add support doing a TID
> > scan for a query like "WHERE ctid< '(501,1)'". That's not enough work
> > for GSoC project on its own, but could certainly be a part of it.
Hi, Heikki
> 1. We probably don't want the SQL syntax to be added to the grammar.
> This should be written as an extension, using custom functions as the
> API, instead of extra SQL syntax.
>
> 2. It's not very useful if it's just a dummy replacement for "WHERE
> random() < ?". It has to be mo
On Thu, Apr 19, 2012 at 08:47:51AM -0400, Stephen Frost wrote:
> * Sandro Santilli (s...@keybit.net) wrote:
> > Actually a random sample would really be representative of the data
> > distribution. What the type analyzer gets is a sample and that sample
> > is what the estimator looks at to answer
* Sandro Santilli (s...@keybit.net) wrote:
> Actually a random sample would really be representative of the data
> distribution. What the type analyzer gets is a sample and that sample
> is what the estimator looks at to answer the question:
That might work if all you have is point data, but lines
On Mon, 16 Apr 2012 23:17:25 -0700, Heikki Linnakangas wrote:
> 1. We probably don't want the SQL syntax to be added to the
>grammar. This should be written as an extension, using custom
>functions as the API, instead of extra SQL syntax.
I can't find the discussion about this, have any p
On Tue, Apr 17, 2012 at 04:29:52PM -0400, Stephen Frost wrote:
> Josh,
>
> * Josh Berkus (j...@agliodbs.com) wrote:
> > FWIW, the PostGIS folks would *really* love to have a TABLESAMPLE which
> > worked with geographic indexes. This would be tremendously useful for
> > constructing low-resolution
> Date: Wed, 18 Apr 2012 02:45:09 +0300
> Subject: Re: [HACKERS] Gsoc2012 idea, tablesample
> From: a...@cybertec.at
> To: cbbro...@gmail.com
> CC: sfr...@snowman.net; pgsql-hackers@postgresql.org
>
> On Tue, Apr 17, 2012 at 7:33 PM, Christopher Browne
> wrote:
>
On Tue, Apr 17, 2012 at 7:33 PM, Christopher Browne wrote:
> Well, there may be cases where the quality of the sample isn't
> terribly important, it just needs to be "reasonable."
>
> I browsed an article on the SYSTEM/BERNOULLI representations; they
> both amount to simple picks of tuples.
>
> -
Josh,
* Josh Berkus (j...@agliodbs.com) wrote:
> FWIW, the PostGIS folks would *really* love to have a TABLESAMPLE which
> worked with geographic indexes. This would be tremendously useful for
> constructing low-resolution "zoom out" tiles on maps and similar.
I'm familiar with the concept of 'z
Qi, Hackers:
FWIW, the PostGIS folks would *really* love to have a TABLESAMPLE which
worked with geographic indexes. This would be tremendously useful for
constructing low-resolution "zoom out" tiles on maps and similar.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via
On Tue, Apr 17, 2012 at 5:33 PM, Christopher Browne wrote:
> I get the feeling that this is a somewhat-magical feature (in that
> users haven't much hope of understanding in what ways the results are
> deterministic) that is sufficiently "magical" that anyone serious
> about their result sets is l
On Tue, Apr 17, 2012 at 11:27 AM, Stephen Frost wrote:
> Qi,
>
> * Qi Huang (huangq...@hotmail.com) wrote:
>> > Doing it 'right' certainly isn't going to be simply taking what Neil did
>> > and updating it, and I understand Tom's concerns about having this be
>> > more than a hack on seqscan, so I
Qi,
* Qi Huang (huangq...@hotmail.com) wrote:
> > Doing it 'right' certainly isn't going to be simply taking what Neil did
> > and updating it, and I understand Tom's concerns about having this be
> > more than a hack on seqscan, so I'm a bit nervous that this would turn
> > into something bigger
> > 2. It's not very useful if it's just a dummy replacement for "WHERE
> > random() < ?". It has to be more advanced than that. Quality of the
> > sample is important, as is performance. There was also an
> > interesting idea of on implementing monetary unit sampling.
>
> In reviewing this, I go
On Tue, Apr 17, 2012 at 2:49 PM, Stephen Frost wrote:
> * Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote:
>> 1. We probably don't want the SQL syntax to be added to the grammar.
>> This should be written as an extension, using custom functions as
>> the API, instead of extra SQL sy
Stephen Frost writes:
> * Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote:
>> Another idea that Robert Haas suggested was to add support doing a
>> TID scan for a query like "WHERE ctid< '(501,1)'". That's not
>> enough work for GSoC project on its own, but could certainly be a
>>
* Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote:
> 1. We probably don't want the SQL syntax to be added to the grammar.
> This should be written as an extension, using custom functions as
> the API, instead of extra SQL syntax.
Err, I missed that, and don't particularly agree with
On 17.04.2012 14:55, Qi Huang wrote:
Hi, Heikki Thanks for your advice.I will change my plan accordingly. But
I have a few questions.
1. We probably don't want the SQL syntax to be added to the grammar.
This should be written as an extension, using custom functions as the
API, instead of
Besides, I saw the Gsoc site editing has been closed. Should I just submit
through this mailing list with attachment?
Best Regards and ThanksHuang Qi VictorComputer Science of National University
of Singapore
> Date: Tue, 17 Apr 2012 09:16:29 +0300
> From: heikki.linnakan...@enterprisedb.com
>
Hi, Heikki Thanks for your advice.I will change my plan accordingly. But
I have a few questions.
> 1. We probably don't want the SQL syntax to be added to the grammar.
> This should be written as an extension, using custom functions as the
> API, instead of extra SQL syntax.
>
1. "Th
53 matches
Mail list logo