Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Tom Lane
"Kevin Grittner" writes: > Robert Haas wrote: >> So should we give up on this patch? > No, this is not news; just confirmation of the earlier gut feelings > and less convincing statistics that there is no problem. Tom's > argument that if there's no slowdown for common cases, preventing > O(N

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Kevin Grittner
I wrote: > I remember someone else on the thread saying [...] > it provided better structure for future enhancements. Found the reference: http://archives.postgresql.org/pgsql-hackers/2009-08/msg00078.php This was the email which I thought confirmed that the changes were worth it, even in

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Robert Haas
On Fri, Aug 7, 2009 at 3:36 PM, Sam Mason wrote: > On Fri, Aug 07, 2009 at 03:18:54PM -0400, Robert Haas wrote: >> On Fri, Aug 7, 2009 at 3:08 PM, Kevin Grittner >> wrote: >> > With the 20 samples from that last round of tests, the answer (rounded >> > to the nearest percent) is 60%, so "probably

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Kevin Grittner
Robert Haas wrote: > So should we give up on this patch? No, this is not news; just confirmation of the earlier gut feelings and less convincing statistics that there is no problem. Tom's argument that if there's no slowdown for common cases, preventing O(N^2) behavior for extreme cases is c

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 03:18:54PM -0400, Robert Haas wrote: > On Fri, Aug 7, 2009 at 3:08 PM, Kevin Grittner > wrote: > > With the 20 samples from that last round of tests, the answer (rounded > > to the nearest percent) is 60%, so "probably noise" is a good summary. > > So should we give up on

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 02:08:21PM -0500, Kevin Grittner wrote: > With the 20 samples from that last round of tests, the answer (rounded > to the nearest percent) is 60%, so "probably noise" is a good summary. > Combined with the 12 samples from earlier comparable runs with the > prior version of t

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Robert Haas
On Fri, Aug 7, 2009 at 3:08 PM, Kevin Grittner wrote: > Sam Mason wrote: > >> All we're saying is that we're less than 90% confident that there's >> something "significant" going on.  All the fiddling with standard >> deviations and sample sizes is just easiest way (that I know of) >> that statist

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Kevin Grittner
Sam Mason wrote: > All we're saying is that we're less than 90% confident that there's > something "significant" going on. All the fiddling with standard > deviations and sample sizes is just easiest way (that I know of) > that statistics currently gives us of determining this more formally >

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 10:39:19AM -0500, Kevin Grittner wrote: > Sam Mason wrote: > > Yes, all that sounds as though you've got it. > > Thanks. I read through it carefully a few times, but I was still only > 80% confident that I had it more-or-less right. ;-) And which method did you use to

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Kevin Grittner
Sam Mason wrote: > Yes, all that sounds as though you've got it. Thanks. I read through it carefully a few times, but I was still only 80% confident that I had it more-or-less right. ;-) That does seem like a good test, with the advantage of being relatively easy to calculate. Thanks aga

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 10:19:20AM -0500, Kevin Grittner wrote: > Sam Mason wrote: > > > What do people do when testing this? I think I'd look to something > > like Student's t-test to check for statistical significance. My > > working would go something like: > > > > I assume the variance

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Kevin Grittner
Sam Mason wrote: > What do people do when testing this? I think I'd look to something > like Student's t-test to check for statistical significance. My > working would go something like: > > I assume the variance is the same because it's being tested on the > same machine. > > samples

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-04 Thread Sam Mason
On Tue, Aug 04, 2009 at 10:45:52AM -0400, Tom Lane wrote: > Sam Mason writes: > > t = 0.54 ((avg1 - avg2) / (stddev * sqrt(2/samples))) > > > We then have to choose how certain we want to be that they're actually > > different, 90% is a reasonably easy level to hit (i.e. one part in ten,

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-04 Thread Tom Lane
Sam Mason writes: > t = 0.54 ((avg1 - avg2) / (stddev * sqrt(2/samples))) > We then have to choose how certain we want to be that they're actually > different, 90% is a reasonably easy level to hit (i.e. one part in ten, > with 95% being more commonly quoted). For 20 samples we have 19

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-04 Thread Sam Mason
On Mon, Aug 03, 2009 at 10:03:47AM -0500, Kevin Grittner wrote: > That's about 0.52% slower with the patch. Because there was over 10% > variation in the numbers with the patch, I tried leaving out the four > highest outliers on both, in case it was the result of some other > activity on the syste

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-03 Thread daveg
On Mon, Aug 03, 2009 at 11:21:43AM -0400, Tom Lane wrote: > "Kevin Grittner" writes: > > Over the weekend I ran 40 restores of Milwaukee County's production > > data using Friday's snapshot with and without the patch. I alternated > > between patched and unpatched. It appears that this latest ve

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-03 Thread Josh Berkus
IIRC daveg was volunteering to do some tests with his own data; maybe we should wait for those results. Unfortunately, I've lost access to the client's data which was showing bad behaviour under the first heuristic. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com -- Sent via pgsql

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-03 Thread Tom Lane
"Kevin Grittner" writes: > Over the weekend I ran 40 restores of Milwaukee County's production > data using Friday's snapshot with and without the patch. I alternated > between patched and unpatched. It appears that this latest version is > slightly slower for our production database on the same

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-03 Thread Andrew Dunstan
That's about 0.52% slower with the patch. Because there was over 10% variation in the numbers with the patch, I tried leaving out the four highest outliers on both, in case it was the result of some other activity on the system (even though this machine should have been pretty quiet over the we

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-03 Thread Kevin Grittner
I wrote: > Tom Lane wrote: >> Attached is a further small improvement that gets rid of the >> find_ready_items() scans. After re-reading the patch I realized >> that it wasn't *really* avoiding O(N^2) behavior ... but this >> version does. > > I'll run a fresh set of benchmarks. Over the

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-31 Thread Tom Lane
daveg writes: > Will the patch apply to a vanilla 8.4.0? Yeah, it should. The line numbers in the version I just posted might be off a little bit for 8.4.0, but patch should cope. Be sure to "make clean" and recompile all of src/bin/pg_dump, else you might have some issues.

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-31 Thread daveg
On Thu, Jul 30, 2009 at 12:29:34PM -0500, Kevin Grittner wrote: > Tom Lane wrote: > > > I think we've pretty much established that it doesn't make things > > *worse*, so I'm sort of inclined to go ahead and apply it. The > > theoretical advantage of eliminating O(N^2) search behavior seems > >

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-31 Thread Kevin Grittner
Tom Lane wrote: > "Kevin Grittner" writes: >> Rebased to correct for pg_indent changes. > > Thanks for doing that. No problem. I think I still owe you a few. :-) > Attached is a further small improvement that gets rid of the > find_ready_items() scans. After re-reading the patch I realiz

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-31 Thread Tom Lane
"Kevin Grittner" writes: > Rebased to correct for pg_indent changes. Thanks for doing that. Attached is a further small improvement that gets rid of the find_ready_items() scans. After re-reading the patch I realized that it wasn't *really* avoiding O(N^2) behavior ... but this version does.

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-30 Thread Kevin Grittner
Tom Lane wrote: > I think we've pretty much established that it doesn't make things > *worse*, so I'm sort of inclined to go ahead and apply it. The > theoretical advantage of eliminating O(N^2) search behavior seems > like reason enough, even if it takes a ridiculous number of tables > for th

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-30 Thread Robert Haas
On Thu, Jul 30, 2009 at 1:24 PM, Tom Lane wrote: > "Kevin Grittner" writes: >> The timings vary by up to 2.5% between runs, so that's the noise >> level.  Five runs of each (alternating between the two) last night >> give an average performance of 1.89% faster for the patched version. >> Combining

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-30 Thread Tom Lane
"Kevin Grittner" writes: > The timings vary by up to 2.5% between runs, so that's the noise > level. Five runs of each (alternating between the two) last night > give an average performance of 1.89% faster for the patched version. > Combining that with yesterday's results starts to give me prett

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-30 Thread Kevin Grittner
"Kevin Grittner" wrote: > with the default settings, the patched version ran an additional 1% > faster than the unpatched; although I don't have enough samples to > have a high degree of confidence it wasn't noise. I'll run another > slew of tests tonight with the existing dump file to confirm

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-29 Thread Tom Lane
"Kevin Grittner" writes: > Tom Lane wrote: >> Also, the followup to that message points out that the 8.4.0 code >> has a potential O(N^2) dependency on the total number of TOC items >> in the dump. So it might be interesting to check the behavior with >> very large numbers of tables/indexes.

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-29 Thread Kevin Grittner
Tom Lane wrote: > Also, the followup to that message points out that the 8.4.0 code > has a potential O(N^2) dependency on the total number of TOC items > in the dump. So it might be interesting to check the behavior with > very large numbers of tables/indexes. I've got 431 user tables with

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-29 Thread Kevin Grittner
Robert Haas wrote: > This is what I've been able to find on a quick look: > > http://archives.postgresql.org/pgsql-hackers/2009-05/msg00678.php > > Sounds like Kevin may want to try renaming some of his indices to > produce intermingling... Thanks, I'll give that a try. Renaming them is on

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-29 Thread Tom Lane
Robert Haas writes: > On Tue, Jul 28, 2009 at 9:52 PM, Tom Lane wrote: >> I don't have time to look right now, but ISTM the original discussion >> that led to making that patch had ideas about scenarios where it would >> be faster. > This is what I've been able to find on a quick look: > http://a

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-29 Thread Robert Haas
On Tue, Jul 28, 2009 at 9:52 PM, Tom Lane wrote: > Robert Haas writes: >> The other possibility here is that this just doesn't work.  :-) > > That's why we wanted to test it ;-). > > I don't have time to look right now, but ISTM the original discussion > that led to making that patch had ideas abo

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-28 Thread Tom Lane
Robert Haas writes: > The other possibility here is that this just doesn't work. :-) That's why we wanted to test it ;-). I don't have time to look right now, but ISTM the original discussion that led to making that patch had ideas about scenarios where it would be faster. It'd be worth diggin

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-28 Thread Robert Haas
On Tue, Jul 28, 2009 at 10:28 AM, Kevin Grittner wrote: > I wrote: > >> So far, all tests have shown no difference in performance based on >> the patch; > > My testing to that point had been on a "big" machine with 16 CPUs and > 128 GB RAM and dozens of spindles.  Last night I tried with a dual > c

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-28 Thread Kevin Grittner
I wrote: > So far, all tests have shown no difference in performance based on > the patch; My testing to that point had been on a "big" machine with 16 CPUs and 128 GB RAM and dozens of spindles. Last night I tried with a dual core machine with 4 GB RAM and 5 spindles in RAID 5. Still no dif

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-27 Thread Kevin Grittner
Andrew Dunstan wrote: > Does your test case have lots of foreign keys? 488 of them. There is some variation on individual tests, but the results look to be "in the noise." When I add them all up, the patch comes out 0.0036% slower -- but that is so far into the noise as to be considered "no

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-27 Thread Andrew Dunstan
Kevin Grittner wrote: Andrew Dunstan wrote: To performance test this properly you might need to devise a test that will use a sufficiently different order of queueing items to show the difference. It would appear that I need help with devising a proper test. So far, all tests ha

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-27 Thread Kevin Grittner
Andrew Dunstan wrote: > To performance test this properly you might need to devise a test > that will use a sufficiently different order of queueing items to > show the difference. It would appear that I need help with devising a proper test. So far, all tests have shown no difference in perf

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-20 Thread Kevin Grittner
Stefan Kaltenbrunner wrote: >> My plan here would be to have >> the dump on one machine, and run pg_restore there, and push it to a >> database on another machine through the LAN on a 1Gb connection. >> (This seems most likely to be what we'd be doing in real life.) > you need to be careful he

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-20 Thread Kevin Grittner
Robert Haas wrote: > it might be worth testing with default settings too. OK. I'll do that too, if time allows. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-20 Thread Kevin Grittner
Andrew Dunstan wrote: > To performance test this properly you might need to devise a test > that will use a sufficiently different order of queueing items to > show the difference. > > One thing I am particularly interested in is to see if queuing FK > items for a table as soon as they become a

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-19 Thread Josh Berkus
Kevin, It would be hard to schedule the requisite time on our biggest web machines, but I assume an 8 core 64GB machine would give meaningful results. Any sense what numbers of parallel jobs I should use for tests? I would be tempted to try 1 (with the -1 switch), 8, 12, and 16 -- maybe keep g

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-19 Thread Andrew Dunstan
Robert Haas wrote: On Sat, Jul 18, 2009 at 4:41 PM, Kevin Grittner wrote: "Kevin Grittner" wrote: Performance tests to follow in a day or two. I'm looking to beg another week or so on this to run more tests. What I can have by the end of today is pretty limited, mostly beca

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-19 Thread Stefan Kaltenbrunner
Kevin Grittner wrote: "Kevin Grittner" wrote: Performance tests to follow in a day or two. I'm looking to beg another week or so on this to run more tests. What I can have by the end of today is pretty limited, mostly because I decided it made the most sense to test this with big complex

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-19 Thread Robert Haas
On Sat, Jul 18, 2009 at 4:41 PM, Kevin Grittner wrote: > "Kevin Grittner" wrote: > >> Performance tests to follow in a day or two. > > I'm looking to beg another week or so on this to run more tests.  What > I can have by the end of today is pretty limited, mostly because I > decided it made the m

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-18 Thread Kevin Grittner
"Kevin Grittner" wrote: > Performance tests to follow in a day or two. I'm looking to beg another week or so on this to run more tests. What I can have by the end of today is pretty limited, mostly because I decided it made the most sense to test this with big complex databases, and it just

[HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-16 Thread Kevin Grittner
Rebased to correct for pg_indent changes. Applies cleanly. Compiles cleanly. Passes regression tests. Comments and format look good. No documentation changes needed. No regression test changes needed. Performance tests to follow in a day or two. -Kevin Index: src/bin/pg_dump/pg_backup_archive