Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-11-15 Thread Bruce Momjian
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --- Jignesh K. Shah wrote: I changed CLOG Buffers to 16 Running the test again: # ./read.d dtrace: script

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Gregory Stark
Josh Berkus [EMAIL PROTECTED] writes: Actually, 32 made a significant difference as I recall ... do you still have the figures for that, Jignesh? Well it made a difference but it didn't remove the bottleneck, it just moved it. IIRC under that benchmark Jignesh was able to run with x sessions

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah
I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is just avoiding the symptom to a later value.. I promise to look more into it before making any recommendations to increase NUM_CLOG_BUFFERs. Because though iGen showed improvements in that area by increasing num_clog_buffers

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah
The problem I saw was first highlighted by EAStress runs with PostgreSQL on Solaris with 120-150 users. I just replicated that via my smaller internal benchmark that we use here to recreate that problem. EAStress should be just fine to highlight it.. Just put pg_clog on O_DIRECT or something

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah
Tom, Here is what I did: I started aggregating all read information: First I also had added group by pid(arg0,arg1, pid) and the counts were all coming as 1 Then I just grouped by filename and location (arg0,arg1 of reads) and the counts came back as # cat read.d #!/usr/sbin/dtrace

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah
Also to give perspective on the equivalent writes on CLOG I used the following script which runs for 10 sec to track all writes to the clog directory and here is what it came up with... (This is with 500 users running) # cat write.d #!/usr/sbin/dtrace -s syscall::write:entry

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Tom Lane
Jignesh K. Shah [EMAIL PROTECTED] writes: So the ratio of reads vs writes to clog files is pretty huge.. It looks to me that the issue is simply one of not having quite enough CLOG buffers. Your first run shows 8 different pages being fetched and the second shows 10. Bearing in mind that we

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-26 Thread Jignesh K. Shah
I changed CLOG Buffers to 16 Running the test again: # ./read.d dtrace: script './read.d' matched 2 probes CPU IDFUNCTION:NAME 0 1027 :tick-5sec /export/home0/igen/pgdata/pg_clog/0024 -27530282192961

[HACKERS] 8.3beta1 testing on Solaris

2007-10-25 Thread Jignesh K. Shah
Update on my testing 8.3beta1 on Solaris. * CLOG reads * Asynchronous Commit benefit * Hot CPU Utilization Regards, Jignesh __Background_:_ We were using PostgreSQL 8.3beta1 testing on our latest Sun SPARC Enterprise T5220 Server using Solaris 10 8/07 and Sun Fire X4200 using Solaris 10

Re: [HACKERS] 8.3beta1 testing on Solaris

2007-10-25 Thread Tom Lane
Jignesh K. Shah [EMAIL PROTECTED] writes: CLOG data is not cached in any PostgreSQL shared memory segments The above statement is utterly false, so your trace seems to indicate something broken. Are you sure these were the only reads of pg_clog files? Can you extend the tracing to determine

Re: [HACKERS] 8.3beta1 testing on Solaris

2007-10-25 Thread Gregory Stark
Jignesh K. Shah [EMAIL PROTECTED] writes: CLOG data is not cached in any PostgreSQL shared memory segments and hence becomes the bottleneck as it has to constantly go to the filesystem to get the read data. This is the same bottleneck you discussed earlier. CLOG reads are cached in the

Re: [HACKERS] 8.3beta1 testing on Solaris

2007-10-25 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: Jignesh K. Shah [EMAIL PROTECTED] writes: CLOG data is not cached in any PostgreSQL shared memory segments The above statement is utterly false, so your trace seems to indicate something broken. Are you sure these were the only reads of pg_clog files?

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-25 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: Didn't we already go through this? He and Simon were pushing to bump up NUM_CLOG_BUFFERS and you were arguing that the test wasn't representative and some other clog.c would have to be reengineered to scale well to larger values. AFAIR we never did get

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-25 Thread Josh Berkus
Tom, It's still true that I'm leery of a large increase in the number of buffers without reengineering slru.c. That code was written on the assumption that there were few enough buffers that a linear search would be fine. I'd hold still for 16, or maybe even 32, but I dunno how much impact

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

2007-10-25 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes: Actually, 32 made a significant difference as I recall ... do you still have the figures for that, Jignesh? I'd want to see a new set of test runs backing up any call for a change in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that