Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Simon Riggs
On Thu, 2007-12-13 at 06:27 +, Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Joshua D. Drake [EMAIL PROTECTED] writes: Exactly. Which is the point I am making. Five minutes of transactions is nothing (speaking generally).. In short, if we are in recovery, and we are not

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2007-12-13 at 06:27 +, Gregory Stark wrote: Heikki proposed a while back to use posix_fadvise() when processing logs to read-ahead blocks which the recover will need before actually attempting to recover them. On a raid array that would bring

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Simon Riggs
On Thu, 2007-12-13 at 09:45 +, Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2007-12-13 at 06:27 +, Gregory Stark wrote: Heikki proposed a while back to use posix_fadvise() when processing logs to read-ahead blocks which the recover will need before actually

[HACKERS] [Fwd: [PATCHES] archiver ps display]

2007-12-13 Thread Simon Riggs
Now that we have set archiver messages to DEBUG levels, there's no easy way of seeing what file we are currently working on. The behaviour of the startup process and archiver should be symmetrical, allowing it to be used for replication monitoring. Before archive_command postgres: archiver

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes: We would have readbuffers in shared memory, like wal_buffers in reverse. Each worker would read the next WAL record and check there is no conflict with other concurrent WAL records. If not, it will apply the record immediately, otherwise wait for the

Re: [HACKERS] [DOCS] distributed checkpoint

2007-12-13 Thread Alvaro Herrera
Josh Berkus wrote: All, Just FYI, it's going to be difficult to replace the name of the feature in the PR docs at this point; I already have 11 translations. What's *wrong* with Load Distributed Checkpoint, which is what we've been calling it for 6 months? Is a translator really not

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Heikki Linnakangas
Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: We would have readbuffers in shared memory, like wal_buffers in reverse. Each worker would read the next WAL record and check there is no conflict with other concurrent WAL records. If not, it will apply the record immediately,

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Guillaume Smet
Simon, On Dec 13, 2007 11:21 AM, Simon Riggs [EMAIL PROTECTED] wrote: Anyway, I'll leave this now, since I think we need to do Florian's work first either way and that is much more eagerly awaited I think. Speaking of that, is there any news about it and about Florian? It was a really

[HACKERS] tuplesort.c

2007-12-13 Thread Manolo _
http://pastebin.com/f304b279e ,,,the tuple we last output,,, What's the problem with just keeping the the key of the last output tuple (not the whole tuple)? I mean, keeping that key will let us writing some incoming tuples (from input) directly into the current run avoiding storing it into

Re: [HACKERS] pgwin32_open returning EINVAL

2007-12-13 Thread Alvaro Herrera
Alvaro Herrera wrote: So _anything_ could be EINVAL. Including the several cases that _dosmaperr treat as EACCES. So I'm afraid that for this experiment to be successful, we would have to remove not only the EINVAL cases from doserrors[], but also any other code that appears more than once

Re: [HACKERS] [PATCHES] result of convert_to is bytea

2007-12-13 Thread Alvaro Herrera
Pavel Stehule escribiĆ³: Hello documentation fix. result of convert_to is bytea, not text. Applied, thanks. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC No deja de ser humillante para una persona de ingenio saber que no hay tonto que no le pueda enseƱar

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Alvaro Herrera
Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: It's a good idea, but it will require more complex code. I prefer the simpler solution of using more processes to solve the I/O problem. Huh, I forgot about that idea. Ironically that was what I suggested when Heikki described

[HACKERS] New style of hash join proposal

2007-12-13 Thread Gregory Stark
We currently execute a lot of joins as Nested Loops which would be more efficient if we could batch together all the outer keys and execute a single inner bitmap index scan for all of them together. Essentially what I'm saying is that we're missing a trick with Hash Joins which currently

Re: [HACKERS] [Fwd: [PATCHES] archiver ps display]

2007-12-13 Thread Alvaro Herrera
Simon Riggs wrote: Now that we have set archiver messages to DEBUG levels, there's no easy way of seeing what file we are currently working on. The behaviour of the startup process and archiver should be symmetrical, allowing it to be used for replication monitoring. Before

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Alvaro Herrera
Simon Riggs wrote: ISTM its just autovacuum launcher + Hot Standby mixed. I don't think you need a launcher at all. Just get the postmaster to start a configurable number of wal-replay processes (currently the number is hardcoded to 1). -- Alvaro Herrera

Re: [HACKERS] [Fwd: [PATCHES] archiver ps display]

2007-12-13 Thread Simon Riggs
On Thu, 2007-12-13 at 11:16 -0300, Alvaro Herrera wrote: Simon Riggs wrote: On Thu, 2007-12-13 at 10:55 -0300, Alvaro Herrera wrote: I agree that replication should be able to be monitored. However, isn't ps_display supposed to show what the process is _currently_ doing? So if the

Re: [HACKERS] [Fwd: [PATCHES] archiver ps display]

2007-12-13 Thread Alvaro Herrera
Simon Riggs wrote: On Thu, 2007-12-13 at 10:55 -0300, Alvaro Herrera wrote: I agree that replication should be able to be monitored. However, isn't ps_display supposed to show what the process is _currently_ doing? So if the archiver finishes processing a file, its display should go

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Simon Riggs
On Thu, 2007-12-13 at 12:28 +, Heikki Linnakangas wrote: Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: We would have readbuffers in shared memory, like wal_buffers in reverse. Each worker would read the next WAL record and check there is no conflict with other

Re: [HACKERS] [Fwd: [PATCHES] archiver ps display]

2007-12-13 Thread Simon Riggs
On Thu, 2007-12-13 at 10:55 -0300, Alvaro Herrera wrote: Simon Riggs wrote: Now that we have set archiver messages to DEBUG levels, there's no easy way of seeing what file we are currently working on. The behaviour of the startup process and archiver should be symmetrical, allowing

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Simon Riggs
On Thu, 2007-12-13 at 10:18 -0300, Alvaro Herrera wrote: Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: It's a good idea, but it will require more complex code. I prefer the simpler solution of using more processes to solve the I/O problem. Huh, I forgot about that

Re: [HACKERS] [Fwd: [PATCHES] archiver ps display]

2007-12-13 Thread Peter Eisentraut
Am Donnerstag, 13. Dezember 2007 schrieb Simon Riggs: Now that we have set archiver messages to DEBUG levels, there's no easy way of seeing what file we are currently working on. After archive_command postgres: archiver process archived 0001000B ps isn't a very robust and

Re: [HACKERS] [Fwd: [PATCHES] archiver ps display]

2007-12-13 Thread Simon Riggs
On Thu, 2007-12-13 at 15:58 +0100, Peter Eisentraut wrote: Am Donnerstag, 13. Dezember 2007 schrieb Simon Riggs: Now that we have set archiver messages to DEBUG levels, there's no easy way of seeing what file we are currently working on. After archive_command postgres: archiver process

Re: [HACKERS] VLDB Features

2007-12-13 Thread Markus Schiltknecht
Hello Gregory, Gregory Stark wrote: Oracle is using Direct I/O so they need the reader and writer threads to avoid blocking on i/o all the time. We count on the OS doing readahead and buffering our writes so we don't have to. Direct I/O and needing some way to do asynchronous writes and reads

Re: [HACKERS] [DOCS] distributed checkpoint

2007-12-13 Thread Joshua D. Drake
Alvaro Herrera wrote: Josh Berkus wrote: All, Just FYI, it's going to be difficult to replace the name of the feature in the PR docs at this point; I already have 11 translations. What's *wrong* with Load Distributed Checkpoint, which is what we've been calling it for 6 months? Is a

Re: [HACKERS] [DOCS] distributed checkpoint

2007-12-13 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes: Alvaro Herrera wrote: Is a translator really not able to change 3 words in a week? Come again. I think it is likely more about being able to reach the translators. The more common ones such as yourself are obvious but others may not be. Either

Re: [HACKERS] [DOCS] distributed checkpoint

2007-12-13 Thread Josh Berkus
Greg, All: The other problem was that the original description over-sold the feature a bit. It said prevent I/O spikes when it actually just reduces them. Still possible to have a spike, it probably won't be as big though. Your call on whether correcting that mischaracterization is worth

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Heikki Linnakangas
Simon Riggs wrote: Allocate a recovery cache of size maintenance_work_mem that goes away when recovery ends. For every block mentioned in WAL record that isn't an overwrite, first check shared_buffers. If its in shared_buffers apply immediately and move on. If not in shared_buffers then put in

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Simon Riggs
On Thu, 2007-12-13 at 20:25 +, Heikki Linnakangas wrote: Simon Riggs wrote: Allocate a recovery cache of size maintenance_work_mem that goes away when recovery ends. For every block mentioned in WAL record that isn't an overwrite, first check shared_buffers. If its in

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Simon Riggs
On Thu, 2007-12-13 at 21:13 +, Simon Riggs wrote: Of course if we scan that far ahead we can start removing aborted transactions also, which is the more standard optimization of recovery. Recall that thought! -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Hmm. That assumes that nothing else than the WAL replay will read pages into shared buffers. I guess that's true at the moment, but it doesn't seem impossible that something like Florian's read-only queries on a stand by server would change that.

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Greg Smith
On Thu, 13 Dec 2007, Gregory Stark wrote: Note that even though the processor is 99% in wait state the drive is only handling about 3 MB/s. That translates into a seek time of 2.2ms which is actually pretty fast...But note that if this were a raid array Postgres's wouldn't be getting any

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thu, 13 Dec 2007 11:12:26 -0800 Joshua D. Drake [EMAIL PROTECTED] wrote: Hmm --- I was testing a straight crash-recovery scenario, not restoring from archive. Are you sure your restore_command script isn't responsible for a lot of the

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Simon Riggs
On Thu, 2007-12-13 at 16:41 -0500, Tom Lane wrote: Recovery is inherently one of the least-exercised parts of the system, and it gets more so with each robustness improvement we make elsewhere. Moreover, because it is fairly dumb, anything that does go wrong will likely result in silent data

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Heikki Linnakangas
Tom Lane wrote: Also, I have not seen anyone provide a very credible argument why we should spend a lot of effort on optimizing a part of the system that is so little-exercised. Don't tell me about warm standby systems --- they are fine as long as recovery is at least as fast as the original

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: Koichi showed me Simon graphs of DBT-2 runs in their test lab back in May. They had setup two identical systems, one running the benchmark, and another one as a warm stand-by. The stand-by couldn't keep up; it couldn't replay the WAL as quickly

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes: It would be interesting to do something like that to speed up replay of long PITR archives, though. You could scan all (or at least far ahead) the WAL records, and make note of where there is full page writes for each page. Whenever there's a full

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Josh Berkus
Tom, [ shrug... ] This is not consistent with my experience. I can't help suspecting misconfiguration; perhaps shared_buffers much smaller on the backup, for example. You're only going to see it on SMP systems which have a high degree of CPU utilization. That is, when you have 16 cores

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes: Tom, [ shrug... ] This is not consistent with my experience. I can't help suspecting misconfiguration; perhaps shared_buffers much smaller on the backup, for example. You're only going to see it on SMP systems which have a high degree of CPU

Re: [HACKERS] [GENERAL] Slow PITR restore

2007-12-13 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes: The argument that Heikki actually made was that multiple parallel queries could use more of the I/O bandwidth of a multi-disk array than recovery could. Which I believe, but I question how much of a real-world problem it is. For it to be an issue, you'd

[HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Simon Riggs
I'm fairly surprised these queries work. Is there some reason why we support this? April Fools Day? Jules Verne? I'm all for fast queries, but zero seems like the lowest value we should support... postgres=# select * from accounts limit -9; aid | bid | abalance | filler

Re: [HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes: I'm fairly surprised these queries work. Is there some reason why we support this? April Fools Day? Jules Verne? I'm all for fast queries, but zero seems like the lowest value we should support... Huh, I was all set to post an example of a useful

Re: [HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Andrew Sullivan
On Fri, Dec 14, 2007 at 01:47:23AM +, Gregory Stark wrote: Huh, I was all set to post an example of a useful application of it but then apparently I'm wrong and it doesn't work: I dimly remember some discussion of this issue once before, maybe a year ago. My memory isn't what it was, and I

Re: [HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Jonah H. Harris
On Dec 13, 2007 9:43 PM, Andrew Sullivan [EMAIL PROTECTED] wrote: I dimly remember some discussion of this issue once before, maybe a year ago. My memory isn't what it was, and I can't find it by trolling archives, but I recall Tom saying that it was dumb, yes, but don't do that, because

Re: [HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Andrew Sullivan
On Thu, Dec 13, 2007 at 10:06:35PM -0500, Tom Lane wrote: of a negative limit, it's meeting its spec. If you want to throw an error for negative limit, shouldn't you logically also throw an error Should it be a WARNING? A ---(end of

Re: [HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Andrew Sullivan
On Thu, Dec 13, 2007 at 10:01:43PM -0500, Jonah H. Harris wrote: Man, maybe my mad Google skillz are not as mad as I thought :( Hey, I worked in a library some years ago, when Google was just a googlet, and I couldn't find it either. It's a dim memory, note. Which could mean artifact. I'm

Re: [HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Tom Lane
Andrew Sullivan [EMAIL PROTECTED] writes: On Fri, Dec 14, 2007 at 01:47:23AM +, Gregory Stark wrote: Huh, I was all set to post an example of a useful application of it but then apparently I'm wrong and it doesn't work: I dimly remember some discussion of this issue once before, maybe a

Re: [HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Jonah H. Harris
On Dec 13, 2007 10:06 PM, Tom Lane [EMAIL PROTECTED] wrote: I guess that on purely philosophical grounds, it's not an unreasonable behavior. For example, LIMIT n means output at most n tuples, not output exactly n tuples. So when it outputs no tuples in the face of a negative limit, it's

Re: [HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Tom Lane
Jonah H. Harris [EMAIL PROTECTED] writes: Don't we have any similar usability cases in the system like this, where negatives are not allowed only for the sake of it being an insane setting? I'm tired, but I thought we did. Yeah, probably. It's the kind of thing where the call is close enough

Re: [HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Merlin Moncure
On Dec 13, 2007 10:06 PM, Tom Lane [EMAIL PROTECTED] wrote: Andrew Sullivan [EMAIL PROTECTED] writes: On Fri, Dec 14, 2007 at 01:47:23AM +, Gregory Stark wrote: Huh, I was all set to post an example of a useful application of it but then apparently I'm wrong and it doesn't work: I

Re: [HACKERS] Negative LIMIT and OFFSET?

2007-12-13 Thread Neil Conway
On Thu, 2007-12-13 at 22:06 -0500, Tom Lane wrote: I guess that on purely philosophical grounds, it's not an unreasonable behavior. For example, LIMIT n means output at most n tuples, not output exactly n tuples. So when it outputs no tuples in the face of a negative limit, it's meeting its