Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-07-24 Thread Stefan Kaltenbrunner
On 07/24/2011 03:50 AM, Jeff Janes wrote: On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: On 06/13/2011 01:55 PM, Stefan Kaltenbrunner wrote: [...] all those tests are done with pgbench running on the same box - which has a noticable impact on the

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-07-24 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes: How was this profile generated? I get a similar profile using --enable-profiling and gprof, but I find it not believable. The complete absence of any calls to libpq is not credible. I don't know about your profiler, but with gprof they should be

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-07-24 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: interesting - iirc we actually had some reports about current libpq behaviour causing scaling issues on some OSes - see http://archives.postgresql.org/pgsql-hackers/2009-06/msg00748.php and some related threads. Iirc the final patch for that

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-07-24 Thread Stefan Kaltenbrunner
On 07/24/2011 05:55 PM, Tom Lane wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: interesting - iirc we actually had some reports about current libpq behaviour causing scaling issues on some OSes - see http://archives.postgresql.org/pgsql-hackers/2009-06/msg00748.php and some

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-07-23 Thread Jeff Janes
On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: On 06/13/2011 01:55 PM, Stefan Kaltenbrunner wrote: [...] all those tests are done with pgbench running on the same box - which has a noticable impact on the results because pgbench is using ~1 core per 8

Re: [HACKERS] lazy vxid locks, v1

2011-06-22 Thread Florian Pflug
On Jun12, 2011, at 23:39 , Robert Haas wrote: So, the majority (60%) of the excess spinning appears to be due to SInvalReadLock. A good chunk are due to ProcArrayLock (25%). Hm, sizeof(LWLock) is 24 on X86-64, making sizeof(LWLockPadded) 32. However, cache lines are 64 bytes large on recent

Re: [HACKERS] lazy vxid locks, v1

2011-06-14 Thread Robert Haas
On Mon, Jun 13, 2011 at 8:10 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Sun, Jun 12, 2011 at 2:39 PM, Robert Haas robertmh...@gmail.com wrote: ... Profiling reveals that the system spends enormous amounts of CPU time in s_lock.  LWLOCK_STATS reveals that the only lwlock with significant

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-14 Thread Jeff Janes
On Mon, Jun 13, 2011 at 9:09 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: I noticed that pgbench's doCustom (the function highest in the profile posted) returns doing nothing if the connection is supposed to be sleeping; seems an open door for busy waiting.  I didn't check the rest of

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-14 Thread Stefan Kaltenbrunner
On 06/14/2011 02:27 AM, Jeff Janes wrote: On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: ... so it seems that sysbench is actually significantly less overhead than pgbench and the lower throughput at the higher conncurency seems to be cause by sysbench

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Stefan Kaltenbrunner
On 06/12/2011 11:39 PM, Robert Haas wrote: Here is a patch that applies over the reducing the overhead of frequent table locks (fastlock-v3) patch and allows heavyweight VXID locks to spring into existence only when someone wants to wait on them. I believe there is a large benefit to be had

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Kevin Grittner
Stefan Kaltenbrunner wrote: on that particular 40cores/80 threads box: unpatched: c40:tps = 107689.945323 (including connections establishing) c80:tps = 101885.549081 (including connections establishing) fast locks: c40:tps = 215807.263233 (including connections

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Stefan Kaltenbrunner
On 06/13/2011 02:29 PM, Kevin Grittner wrote: Stefan Kaltenbrunner wrote: on that particular 40cores/80 threads box: unpatched: c40:tps = 107689.945323 (including connections establishing) c80:tps = 101885.549081 (including connections establishing) fast locks: c40:

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Stefan Kaltenbrunner
On 06/12/2011 11:39 PM, Robert Haas wrote: Here is a patch that applies over the reducing the overhead of frequent table locks (fastlock-v3) patch and allows heavyweight VXID locks to spring into existence only when someone wants to wait on them. I believe there is a large benefit to be had

pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Stefan Kaltenbrunner
On 06/13/2011 01:55 PM, Stefan Kaltenbrunner wrote: [...] all those tests are done with pgbench running on the same box - which has a noticable impact on the results because pgbench is using ~1 core per 8 cores of the backend tested in cpu resoures - though I don't think it causes any

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: On 06/12/2011 11:39 PM, Robert Haas wrote: Profiling reveals that the system spends enormous amounts of CPU time in s_lock. just to reiterate that with numbers - at 160 threads with both patches applied the profile looks like: samples

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Robert Haas
On Mon, Jun 13, 2011 at 10:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes: On 06/12/2011 11:39 PM, Robert Haas wrote: Profiling reveals that the system spends enormous amounts of CPU time in s_lock. just to reiterate that with numbers - at 160

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Jeff Janes
On Sun, Jun 12, 2011 at 2:39 PM, Robert Haas robertmh...@gmail.com wrote: ... Profiling reveals that the system spends enormous amounts of CPU time in s_lock.  LWLOCK_STATS reveals that the only lwlock with significant amounts of blocking is the BufFreelistLock; This is curious. Clearly the

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Jeff Janes
On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: ... so it seems that sysbench is actually significantly less overhead than pgbench and the lower throughput at the higher conncurency seems to be cause by sysbench being able to stress the backend even more

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Itagaki Takahiro
On Tue, Jun 14, 2011 at 09:27, Jeff Janes jeff.ja...@gmail.com wrote: pgbench sends each query (per connection) and waits for the reply before sending another. We can use -j option to run pgbench in multiple threads to avoid request starvation. What setting did you use, Stefan? for those

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Greg Smith
On 06/13/2011 08:27 PM, Jeff Janes wrote: pgbench sends each query (per connection) and waits for the reply before sending another. Do we know whether sysbench does that, or if it just stuffs the kernel's IPC buffer full of queries without synchronously waiting for individual replies?

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Greg Smith
On 06/13/2011 07:55 AM, Stefan Kaltenbrunner wrote: all those tests are done with pgbench running on the same box - which has a noticable impact on the results because pgbench is using ~1 core per 8 cores of the backend tested in cpu resoures - though I don't think it causes any changes in the

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Alvaro Herrera
Excerpts from Jeff Janes's message of lun jun 13 20:27:15 -0400 2011: On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: ... so it seems that sysbench is actually significantly less overhead than pgbench and the lower throughput at the higher

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Itagaki Takahiro
On Tue, Jun 14, 2011 at 13:09, Alvaro Herrera alvhe...@commandprompt.com wrote: I noticed that pgbench's doCustom (the function highest in the profile posted) returns doing nothing if the connection is supposed to be sleeping; seems an open door for busy waiting. pgbench uses select()

[HACKERS] lazy vxid locks, v1

2011-06-12 Thread Robert Haas
Here is a patch that applies over the reducing the overhead of frequent table locks (fastlock-v3) patch and allows heavyweight VXID locks to spring into existence only when someone wants to wait on them. I believe there is a large benefit to be had from this optimization, because the combination

Re: [HACKERS] lazy vxid locks, v1

2011-06-12 Thread Greg Stark
On Sun, Jun 12, 2011 at 10:39 PM, Robert Haas robertmh...@gmail.com wrote: I hacked up the system to report how often each lwlock spinlock exceeded spins_per_delay. I don't doubt the rest of your analysis but one thing to note, number of spins on a spinlock is not the same as the amount of time

Re: [HACKERS] lazy vxid locks, v1

2011-06-12 Thread Robert Haas
On Sun, Jun 12, 2011 at 5:58 PM, Greg Stark st...@mit.edu wrote: On Sun, Jun 12, 2011 at 10:39 PM, Robert Haas robertmh...@gmail.com wrote: I hacked up the system to report how often each lwlock spinlock exceeded spins_per_delay. I don't doubt the rest of your analysis but one thing to note,