Re: CPU costs of random_zipfian in pgbench

2019-04-03 Thread Tom Lane
Fabien COELHO writes: >> Ah, so we now we can get rid of the TState * being passed around >> separately for expression execution, too? > Indeed. Indeed. Pushed with some additional cleanup. regards, tom lane

Re: CPU costs of random_zipfian in pgbench

2019-04-03 Thread Fabien COELHO
Ah, so we now we can get rid of the TState * being passed around separately for expression execution, too? Indeed. I would have thought that the compiler would have warned if it is unused, but because of the recursion it is uselessly used. Ok, maybe the sentence above is not very clear. At

Re: CPU costs of random_zipfian in pgbench

2019-04-01 Thread Tom Lane
Alvaro Herrera writes: > On 2019-Apr-01, Tom Lane wrote: >> Seems reasonable. Pushed with minor documentation editing. > Ah, so we now we can get rid of the TState * being passed around > separately for expression execution, too? I didn't really look for follow-on simplifications, but if there

Re: CPU costs of random_zipfian in pgbench

2019-04-01 Thread Alvaro Herrera
On 2019-Apr-01, Tom Lane wrote: > Fabien COELHO writes: > >> I was wondering about that too. It seems like it'd be a wise idea to > >> further constrain s and/or n to ensure that the s > 1 code path doesn't do > >> anything too awful ... > > > Yep. The attached version enforces s >= 1.001, whic

Re: CPU costs of random_zipfian in pgbench

2019-04-01 Thread Tom Lane
Fabien COELHO writes: >> I was wondering about that too. It seems like it'd be a wise idea to >> further constrain s and/or n to ensure that the s > 1 code path doesn't do >> anything too awful ... > Yep. The attached version enforces s >= 1.001, which avoids the worse cost > of iterating, accor

Re: CPU costs of random_zipfian in pgbench

2019-03-24 Thread Fabien COELHO
Hello Tom, If this is done, some people with zipfian distribution that currently work might be unhappy. After giving it some thought, I think that this cannot be fully fixed for 12. Just to clarify --- my complaint about "over engineering" referred to the fact that a cache exists at all; f

Re: CPU costs of random_zipfian in pgbench

2019-03-24 Thread Tom Lane
Fabien COELHO writes: I remain of the opinion that we ought to simply rip out support for zipfian with s < 1. >>> +1 to that >> If this is done, some people with zipfian distribution that currently >> work might be unhappy. > After giving it some thought, I think that this cannot be

Re: CPU costs of random_zipfian in pgbench

2019-03-24 Thread Fabien COELHO
Hello Tom & Tomas, If the choice is between reporting the failure to the user, and addressing the failure, surely the latter would be the default option? Particularly if the user can't really address the issue easily (recompiling psql is not very practical solution). I remain of the opinion t

Re: CPU costs of random_zipfian in pgbench

2019-03-24 Thread Fabien COELHO
Hello Tomas, What would a user do with this information, and how would they know what to do? Sure, but it was unclear what to do. Extending the cache to avoid that would look like over-engineering. That seems like a rather strange argument. What exactly is so complex on resizing the cache

Re: CPU costs of random_zipfian in pgbench

2019-03-24 Thread Fabien COELHO
What is the point of that, and if there is a point, why is it nowhere mentioned in pgbench.sgml? The attached patch simplifies the code by erroring on cache overflow, instead of the LRU replacement strategy and unhelpful final report. The above lines are removed. Eh? Do I understand correct

Re: CPU costs of random_zipfian in pgbench

2019-03-23 Thread Tomas Vondra
On 3/23/19 6:44 PM, Fabien COELHO wrote: > > Hello Tom, > >> I started to look through this, and the more I looked the more unhappy >> I got that we're having this discussion at all.  The zipfian support >> in pgbench is seriously over-engineered and under-documented.  As an >> example, I was

Re: CPU costs of random_zipfian in pgbench

2019-03-23 Thread Tomas Vondra
On 3/23/19 7:45 PM, Fabien COELHO wrote: > What is the point of that, and if there is a point, why is it nowhere mentioned in pgbench.sgml? >> >> The attached patch simplifies the code by erroring on cache overflow, >> instead of the LRU replacement strategy and unhelpful final report

Re: CPU costs of random_zipfian in pgbench

2019-03-23 Thread Fabien COELHO
What is the point of that, and if there is a point, why is it nowhere mentioned in pgbench.sgml? The attached patch simplifies the code by erroring on cache overflow, instead of the LRU replacement strategy and unhelpful final report. The above lines are removed. Same, but without the comp

Re: CPU costs of random_zipfian in pgbench

2019-03-23 Thread Fabien COELHO
Hello again, I started to look through this, and the more I looked the more unhappy I got that we're having this discussion at all. The zipfian support in pgbench is seriously over-engineered and under-documented. As an example, I was flabbergasted to find out that the end-of-run summary stat

Re: CPU costs of random_zipfian in pgbench

2019-03-23 Thread Fabien COELHO
Hello Tom, I started to look through this, and the more I looked the more unhappy I got that we're having this discussion at all. The zipfian support in pgbench is seriously over-engineered and under-documented. As an example, I was flabbergasted to find out that the end-of-run summary stati

Re: CPU costs of random_zipfian in pgbench

2019-03-23 Thread Tom Lane
Fabien COELHO writes: > [ pgbench-zipf-doc-3.patch ] I started to look through this, and the more I looked the more unhappy I got that we're having this discussion at all. The zipfian support in pgbench is seriously over-engineered and under-documented. As an example, I was flabbergasted to fin

Re: CPU costs of random_zipfian in pgbench

2019-03-13 Thread Georgios Kokolatos
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested Version 3 of the patch looks ready for committer. Thank you for taking the t

Re: CPU costs of random_zipfian in pgbench

2019-03-13 Thread Fabien COELHO
For whatever it is worth, the patch looks good to me. A minor nitpick would be to use a verb in the part: `cost when the parameter in (0, 1)` maybe: `cost when the parameter's value is in (0, 1)` or similar. Looks ok. Apart from that, I would suggest it that the patch could be moved to

Re: CPU costs of random_zipfian in pgbench

2019-03-13 Thread Georgios Kokolatos
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested For whatever it is worth, the patch looks good to me. A minor nitpick would

Re: CPU costs of random_zipfian in pgbench

2019-02-22 Thread Fabien COELHO
I also noticed that i is int in this function, but n is int64. That seems like an oversight. Indeed, that is a bug! Here is a v2 with hopefully better wording, comments and a fix for the bug you pointed out. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.s

Re: CPU costs of random_zipfian in pgbench

2019-02-22 Thread Fabien COELHO
There are pretty good approximations for s > 1.0 using Riemann zeta function and Euler derived a formula for the s = 1 case. I believe that's what random_zipfian() already uses, because for s > 1.0 it refers to "Non-Uniform Random Variate Generation" by Luc Devroye, and the text references th

Re: CPU costs of random_zipfian in pgbench

2019-02-22 Thread Tomas Vondra
On 2/22/19 11:22 AM, Ants Aasma wrote: > On Sun, Feb 17, 2019 at 10:52 AM Fabien COELHO > wrote: > > > I'm trying to use random_zipfian() for benchmarking of skewed data > sets, > > and I ran head-first into an issue with rather excessive CPU costs. >

Re: CPU costs of random_zipfian in pgbench

2019-02-22 Thread Ants Aasma
On Sun, Feb 17, 2019 at 10:52 AM Fabien COELHO wrote: > > I'm trying to use random_zipfian() for benchmarking of skewed data sets, > > and I ran head-first into an issue with rather excessive CPU costs. > > [...] This happens because generalizedHarmonicNumber() does this: > > > > for (i = n

Re: CPU costs of random_zipfian in pgbench

2019-02-19 Thread Peter Geoghegan
On Tue, Feb 19, 2019 at 7:14 AM Fabien COELHO wrote: > What I like in "pgbench" is that it is both versatile and simple so that > people can benchmark their own data with their own load and their own > queries by writing a few lines of trivial SQL and psql-like slash command > and adjusting a few

Re: CPU costs of random_zipfian in pgbench

2019-02-19 Thread David Fetter
On Sun, Feb 17, 2019 at 11:02:37PM +0100, Tomas Vondra wrote: > On 2/17/19 6:33 PM, David Fetter wrote: > > On Sun, Feb 17, 2019 at 11:09:27AM -0500, Tom Lane wrote: > >> Fabien COELHO writes: > I'm trying to use random_zipfian() for benchmarking of skewed data sets, > and I ran head-fi

Re: CPU costs of random_zipfian in pgbench

2019-02-19 Thread Fabien COELHO
Hello Peter, My 0.02€: I'm not quite interested in maintaining a tool for *one* benchmark, whatever the benchmark, its standardness or quality. What I like in "pgbench" is that it is both versatile and simple so that people can benchmark their own data with their own load and their own quer

Re: CPU costs of random_zipfian in pgbench

2019-02-17 Thread Peter Geoghegan
On Sun, Feb 17, 2019 at 8:09 AM Tom Lane wrote: > TBH, I think pgbench is now much too complex; it does not need more > features, especially not ones that need large caveats in the docs. > (What exactly is the point of having zipfian at all?) I agree that pgbench is too complex, given its mandate

Re: CPU costs of random_zipfian in pgbench

2019-02-17 Thread Tomas Vondra
On 2/17/19 5:09 PM, Tom Lane wrote: > Fabien COELHO writes: >>> I'm trying to use random_zipfian() for benchmarking of skewed data sets, >>> and I ran head-first into an issue with rather excessive CPU costs. > >> If you want skewed but not especially zipfian, use exponential which is >> quite

Re: CPU costs of random_zipfian in pgbench

2019-02-17 Thread Tomas Vondra
On 2/17/19 6:33 PM, David Fetter wrote: > On Sun, Feb 17, 2019 at 11:09:27AM -0500, Tom Lane wrote: >> Fabien COELHO writes: I'm trying to use random_zipfian() for benchmarking of skewed data sets, and I ran head-first into an issue with rather excessive CPU costs. >> >>> If you want s

Re: CPU costs of random_zipfian in pgbench

2019-02-17 Thread David Fetter
On Sun, Feb 17, 2019 at 11:09:27AM -0500, Tom Lane wrote: > Fabien COELHO writes: > >> I'm trying to use random_zipfian() for benchmarking of skewed data sets, > >> and I ran head-first into an issue with rather excessive CPU costs. > > > If you want skewed but not especially zipfian, use expon

Re: CPU costs of random_zipfian in pgbench

2019-02-17 Thread Tom Lane
Fabien COELHO writes: >> I'm trying to use random_zipfian() for benchmarking of skewed data sets, >> and I ran head-first into an issue with rather excessive CPU costs. > If you want skewed but not especially zipfian, use exponential which is > quite cheap. Also zipfian with a > 1.0 parameter

Re: CPU costs of random_zipfian in pgbench

2019-02-17 Thread Fabien COELHO
Hello Tomas, I'm trying to use random_zipfian() for benchmarking of skewed data sets, and I ran head-first into an issue with rather excessive CPU costs. [...] This happens because generalizedHarmonicNumber() does this: for (i = n; i > 1; i--) ans += pow(i, -s); wher

CPU costs of random_zipfian in pgbench

2019-02-16 Thread Tomas Vondra
Hi, I'm trying to use random_zipfian() for benchmarking of skewed data sets, and I ran head-first into an issue with rather excessive CPU costs. Consider an example like this: pgbench -i -s 1 test pgbench -s 1 -f zipf.sql -T 30 test where zipf.sql does this: \SET id random_zipfi