Re: [HACKERS] warning message in standby

2010-07-02 Thread Robert Haas
On Tue, Jun 29, 2010 at 10:58 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: If someone is sloppy about how they copy the WAL files around, they could temporarily have a truncated file. Can you explain the scenario you're concerned about in

Re: [HACKERS] warning message in standby

2010-06-29 Thread Fujii Masao
On Tue, Jun 15, 2010 at 11:35 AM, Fujii Masao masao.fu...@gmail.com wrote: On the other hand, I like immediate-panicking. And I don't want the standby to retry reconnecting the master infinitely. On second thought, the peremptory PANIC is not good for HA system. If the master unfortunately has

Re: [HACKERS] warning message in standby

2010-06-29 Thread Robert Haas
On Tue, Jun 29, 2010 at 3:55 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 15, 2010 at 11:35 AM, Fujii Masao masao.fu...@gmail.com wrote: On the other hand, I like immediate-panicking. And I don't want the standby to retry reconnecting the master infinitely. On second thought, the

Re: [HACKERS] warning message in standby

2010-06-29 Thread Robert Haas
On Tue, Jun 29, 2010 at 6:59 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jun 29, 2010 at 3:55 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 15, 2010 at 11:35 AM, Fujii Masao masao.fu...@gmail.com wrote: On the other hand, I like immediate-panicking. And I don't want the

Re: [HACKERS] warning message in standby

2010-06-29 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: ...with this patch, following the above, you get: FATAL: invalid record in WAL stream HINT: Take a new base backup, or remove recovery.conf and restart in read-write mode. LOG: startup process (PID 6126) exited with exit code 1 LOG:

Re: [HACKERS] warning message in standby

2010-06-29 Thread Robert Haas
On Tue, Jun 29, 2010 at 10:21 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: ...with this patch, following the above, you get: FATAL:  invalid record in WAL stream HINT:  Take a new base backup, or remove recovery.conf and restart in

Re: [HACKERS] warning message in standby

2010-06-29 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: If someone is sloppy about how they copy the WAL files around, they could temporarily have a truncated file. Can you explain the scenario you're concerned about in more detail? If someone uses cp or scp to copy a WAL file from the pg_xlog

Re: [HACKERS] warning message in standby

2010-06-29 Thread Fujii Masao
On Tue, Jun 29, 2010 at 7:59 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jun 29, 2010 at 3:55 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 15, 2010 at 11:35 AM, Fujii Masao masao.fu...@gmail.com wrote: On the other hand, I like immediate-panicking. And I don't want the

Re: [HACKERS] warning message in standby

2010-06-29 Thread Robert Haas
On Tue, Jun 29, 2010 at 10:03 PM, Fujii Masao masao.fu...@gmail.com wrote: This is true. But what I'm concerned about is: 1. Backend writes and fsyncs the WAL to the disk 2. The WAL on the disk gets corrupted 3. Walsender reads and sends that corrupted WAL image 4. The master crashes because

Re: [HACKERS] warning message in standby

2010-06-14 Thread Heikki Linnakangas
On 12/06/10 04:19, Bruce Momjian wrote: Robert Haas wrote: If my streaming replication stops working, I want to know about it as soon as possible. WARNING just doesn't cut it. This needs some better thought. If we PANIC, then surely it will PANIC again when we restart unless we do something.

Re: [HACKERS] warning message in standby

2010-06-14 Thread Bruce Momjian
Heikki Linnakangas wrote: On 12/06/10 04:19, Bruce Momjian wrote: Robert Haas wrote: If my streaming replication stops working, I want to know about it as soon as possible. WARNING just doesn't cut it. This needs some better thought. If we PANIC, then surely it will PANIC again when

Re: [HACKERS] warning message in standby

2010-06-14 Thread Magnus Hagander
On Mon, Jun 14, 2010 at 12:16, Bruce Momjian br...@momjian.us wrote: Heikki Linnakangas wrote: On 12/06/10 04:19, Bruce Momjian wrote: Robert Haas wrote: If my streaming replication stops working, I want to know about it as soon as possible. WARNING just doesn't cut it. This needs some

Re: [HACKERS] warning message in standby

2010-06-14 Thread Bruce Momjian
Magnus Hagander wrote: Seems like we need something like WARNING that doesn't cause the process to die, but more alarming like ERROR/FATAL/PANIC. Or maybe just adding a hint to the warning will do. How about WARNING: ?invalid record length at 0/4005330 HINT: An invalid record was

Re: [HACKERS] warning message in standby

2010-06-14 Thread Magnus Hagander
On Mon, Jun 14, 2010 at 13:11, Bruce Momjian br...@momjian.us wrote: Magnus Hagander wrote: Seems like we need something like WARNING that doesn't cause the process to die, but more alarming like ERROR/FATAL/PANIC. Or maybe just adding a hint to the warning will do. How about WARNING:

Re: [HACKERS] warning message in standby

2010-06-14 Thread Bruce Momjian
Magnus Hagander wrote: On Mon, Jun 14, 2010 at 13:11, Bruce Momjian br...@momjian.us wrote: Magnus Hagander wrote: Seems like we need something like WARNING that doesn't cause the process to die, but more alarming like ERROR/FATAL/PANIC. Or maybe just adding a hint to the warning will

Re: [HACKERS] warning message in standby

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 7:18 AM, Magnus Hagander mag...@hagander.net wrote: On Mon, Jun 14, 2010 at 13:11, Bruce Momjian br...@momjian.us wrote: Magnus Hagander wrote: Seems like we need something like WARNING that doesn't cause the process to die, but more alarming like ERROR/FATAL/PANIC.

Re: [HACKERS] warning message in standby

2010-06-14 Thread Heikki Linnakangas
On 14/06/10 13:16, Bruce Momjian wrote: Heikki Linnakangas wrote: On 12/06/10 04:19, Bruce Momjian wrote: Robert Haas wrote: If my streaming replication stops working, I want to know about it as soon as possible. WARNING just doesn't cut it. This needs some better thought. If we PANIC, then

Re: [HACKERS] warning message in standby

2010-06-14 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: Magnus Hagander wrote: It means that we can't prevent people from configuring their tools to ignore important warning. We can't prevent them rom ignoring ERROR or FATAL either... My point is that most tools are going to look at the tag first to

Re: [HACKERS] warning message in standby

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 10:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: Magnus Hagander wrote: It means that we can't prevent people from configuring their tools to ignore important warning. We can't prevent them rom ignoring ERROR or FATAL either... My

Re: [HACKERS] warning message in standby

2010-06-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Mon, Jun 14, 2010 at 10:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: The correct log level for this message is LOG.  End of discussion. Why? Because it's not being issued in a user's session. The only place it can go is to the system log, and if you

Re: [HACKERS] warning message in standby

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jun 14, 2010 at 10:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: The correct log level for this message is LOG.  End of discussion. Why? Because it's not being issued in a

Re: [HACKERS] warning message in standby

2010-06-14 Thread Bruce Momjian
Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jun 14, 2010 at 10:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: The correct log level for this message is LOG. ?End of discussion. Why? Because it's not being issued in a user's session. The only place it can go is to the

Re: [HACKERS] warning message in standby

2010-06-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: I'm willing to buy the above, but nobody has explained to my satisfaction why it's remotely sane to go into an infinite retry loop on an unrecoverable error. That's a different question altogether ;-). I assume you're not satisfied by the change

Re: [HACKERS] warning message in standby

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I'm willing to buy the above, but nobody has explained to my satisfaction why it's remotely sane to go into an infinite retry loop on an unrecoverable error. That's a different

Re: [HACKERS] warning message in standby

2010-06-14 Thread Simon Riggs
On Mon, 2010-06-14 at 10:30 -0400, Tom Lane wrote: I'm totally unimpressed by the argument that log-filtering applications don't know enough to pay attention to LOG messages. There are already a lot of those that are quite important to notice. We have a log level where 1 log entry in a

Re: [HACKERS] warning message in standby

2010-06-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Mon, Jun 14, 2010 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: That's a different question altogether ;-).  I assume you're not satisfied by the change Heikki committed a couple hours ago? It will at least try to do something to recover. Yeah,

Re: [HACKERS] warning message in standby

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 10:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jun 14, 2010 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: That's a different question altogether ;-).  I assume you're not satisfied by the change Heikki committed a couple

Re: [HACKERS] warning message in standby

2010-06-14 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: Should I be downgrading Hot Standby breakages to LOG? That will certainly help high availability as well. If a message is being issued in a non-user-connected session, there is basically not a lot of point in WARNING or below. It should either be LOG,

Re: [HACKERS] warning message in standby

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: Should I be downgrading Hot Standby breakages to LOG? That will certainly help high availability as well. If a message is being issued in a non-user-connected session, there is

Re: [HACKERS] warning message in standby

2010-06-14 Thread Simon Riggs
On Mon, 2010-06-14 at 11:14 -0400, Robert Haas wrote: On Mon, Jun 14, 2010 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: Should I be downgrading Hot Standby breakages to LOG? That will certainly help high availability as well. If a message is

Re: [HACKERS] warning message in standby

2010-06-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Mon, Jun 14, 2010 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: If a message is being issued in a non-user-connected session, there is basically not a lot of point in WARNING or below.  It should either be LOG, or ERROR/FATAL/PANIC (which are

Re: [HACKERS] warning message in standby

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 11:34 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2010-06-14 at 11:14 -0400, Robert Haas wrote: On Mon, Jun 14, 2010 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: Should I be downgrading Hot Standby breakages to

Re: [HACKERS] warning message in standby

2010-06-14 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes: On Mon, Jun 14, 2010 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: Should I be downgrading Hot Standby breakages to LOG? That will certainly help high availability as well. If a message is being issued in a

Re: [HACKERS] warning message in standby

2010-06-14 Thread Simon Riggs
On Mon, 2010-06-14 at 18:11 +0200, Dimitri Fontaine wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jun 14, 2010 at 11:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: Should I be downgrading Hot Standby breakages to LOG? That will certainly

Re: [HACKERS] warning message in standby

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 12:31 PM, Simon Riggs si...@2ndquadrant.com wrote: If that's the case, I guess Tom's right, once more, saying that LOG is fine here. If we want to be more subtle than that, we'd need to revise each and every error message and attribute it the right level, which it

Re: [HACKERS] warning message in standby

2010-06-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: Not sure I agree with this - what I think the problem is here is we need to make a clear distinction between recoverable errors and unrecoverable errors. Um, if it's recoverable, it's not really an error ... regards, tom lane

Re: [HACKERS] warning message in standby

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 1:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Not sure I agree with this - what I think the problem is here is we need to make a clear distinction between recoverable errors and unrecoverable errors. Um, if it's recoverable,

Re: [HACKERS] warning message in standby

2010-06-14 Thread Simon Riggs
On Mon, 2010-06-14 at 11:09 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: Should I be downgrading Hot Standby breakages to LOG? That will certainly help high availability as well. If a message is being issued in a non-user-connected session, there is basically not a lot

Re: [HACKERS] warning message in standby

2010-06-14 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: LOG is already over-used and so anything said at that level is drowned. This is nonsense. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] warning message in standby

2010-06-14 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote: LOG is already over-used and so anything said at that level is drowned. In many areas of code we cannot use a higher level without trauma. That is a problem since we have no way to separate the truly important from the barely interesting. The fact

Re: [HACKERS] warning message in standby

2010-06-14 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: Simon Riggs si...@2ndquadrant.com wrote: LOG is already over-used and so anything said at that level is drowned. In many areas of code we cannot use a higher level without trauma. That is a problem since we have no way to separate the truly

Re: [HACKERS] warning message in standby

2010-06-14 Thread Josh Berkus
On 6/14/10 7:57 AM, Tom Lane wrote: However, I do agree that it's not helpful to loop forever. If we can easily make it retry once and then PANIC, I'd be for that --- otherwise I tend to agree that the best thing is just to PANIC immediately. There are many many situations where a slave

Re: [HACKERS] warning message in standby

2010-06-14 Thread Magnus Hagander
On Mon, Jun 14, 2010 at 20:22, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: LOG is already over-used and so anything said at that level is drowned. This is nonsense. Whether it's over-used or not may be, but that doesn't make the general issue nonsense. But

Re: [HACKERS] warning message in standby

2010-06-14 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: The fact that LOG is categorized the same as INFO has led me to believe that they are morally equivalent -- They are not morally equivalent. INFO is for output that the user has explicitly requested

Re: [HACKERS] warning message in standby

2010-06-14 Thread Fujii Masao
On Tue, Jun 15, 2010 at 12:09 AM, Robert Haas robertmh...@gmail.com wrote: The testing that I have been doing while we've been discussing this reveals that you are correct.  I set up an HS/SR master and slave (running on the same machine), ran pgbench on the master, and then started randomly

Re: [HACKERS] warning message in standby

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 10:35 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 15, 2010 at 12:09 AM, Robert Haas robertmh...@gmail.com wrote: The testing that I have been doing while we've been discussing this reveals that you are correct.  I set up an HS/SR master and slave (running

Re: [HACKERS] warning message in standby

2010-06-11 Thread Simon Riggs
On Thu, 2010-06-10 at 09:57 -0400, Robert Haas wrote: On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masao masao.fu...@gmail.com wrote: When an error is found in the WAL streamed from the master, a warning message is repeated without interval forever in the standby. This consumes CPU load very

Re: [HACKERS] warning message in standby

2010-06-11 Thread Heikki Linnakangas
On 11/06/10 07:18, Fujii Masao wrote: On Fri, Jun 11, 2010 at 1:01 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: We're talking about a corrupt record (incorrect CRC, incorrect backlink etc.), not errors within redo functions. During crash recovery, a corrupt record means

Re: [HACKERS] warning message in standby

2010-06-11 Thread Robert Haas
On Fri, Jun 11, 2010 at 8:19 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-06-10 at 09:57 -0400, Robert Haas wrote: On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masao masao.fu...@gmail.com wrote: When an error is found in the WAL streamed from the master, a warning message is repeated

Re: [HACKERS] warning message in standby

2010-06-11 Thread Fujii Masao
On Fri, Jun 11, 2010 at 9:32 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Hmm, right now it doesn't even reconnect when it sees a corrupt record streamed from the master. It's really pointless to retry in that case, reapplying the exact same piece of WAL surely won't work.

Re: [HACKERS] warning message in standby

2010-06-11 Thread Simon Riggs
On Thu, 2010-06-10 at 19:01 +0300, Heikki Linnakangas wrote: What warning message are we talking about? All the error cases I can think of in WAL-application are ERROR, or likely even PANIC. We're talking about a corrupt record (incorrect CRC, incorrect backlink etc.), not errors

Re: [HACKERS] warning message in standby

2010-06-11 Thread Robert Haas
On Fri, Jun 11, 2010 at 9:43 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-06-10 at 19:01 +0300, Heikki Linnakangas wrote: What warning message are we talking about?  All the error cases I can think of in WAL-application are ERROR, or likely even PANIC. We're talking about a

Re: [HACKERS] warning message in standby

2010-06-11 Thread Bruce Momjian
Robert Haas wrote: If my streaming replication stops working, I want to know about it as soon as possible. WARNING just doesn't cut it. This needs some better thought. If we PANIC, then surely it will PANIC again when we restart unless we do something. So we can't do that. But we

Re: [HACKERS] warning message in standby

2010-06-10 Thread Robert Haas
On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masao masao.fu...@gmail.com wrote: When an error is found in the WAL streamed from the master, a warning message is repeated without interval forever in the standby. This consumes CPU load very much, and would interfere with read-only queries. To fix this

Re: [HACKERS] warning message in standby

2010-06-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masao masao.fu...@gmail.com wrote: When an error is found in the WAL streamed from the master, a warning message is repeated without interval forever in the standby. This consumes CPU load very much, and would

Re: [HACKERS] warning message in standby

2010-06-10 Thread Heikki Linnakangas
On 10/06/10 17:38, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masaomasao.fu...@gmail.com wrote: When an error is found in the WAL streamed from the master, a warning message is repeated without interval forever in the standby. This consumes

Re: [HACKERS] warning message in standby

2010-06-10 Thread Robert Haas
On Thu, Jun 10, 2010 at 12:01 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: We're talking about a corrupt record (incorrect CRC, incorrect backlink etc.), not errors within redo functions. During crash recovery, a corrupt record means you've reached end of WAL. In standby

Re: [HACKERS] warning message in standby

2010-06-10 Thread Greg Stark
On Thu, Jun 10, 2010 at 5:13 PM, Robert Haas robertmh...@gmail.com wrote: At this point you should have a working HS/SR setup.  Now: 8. shut the slave down 9. move recovery.conf out of the way 10. restart the slave - it will do recovery and enter normal running 11. make some database changes

Re: [HACKERS] warning message in standby

2010-06-10 Thread Robert Haas
On Thu, Jun 10, 2010 at 12:49 PM, Greg Stark gsst...@mit.edu wrote: On Thu, Jun 10, 2010 at 5:13 PM, Robert Haas robertmh...@gmail.com wrote: At this point you should have a working HS/SR setup.  Now: 8. shut the slave down 9. move recovery.conf out of the way 10. restart the slave - it will

Re: [HACKERS] warning message in standby

2010-06-10 Thread Fujii Masao
On Fri, Jun 11, 2010 at 1:01 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: We're talking about a corrupt record (incorrect CRC, incorrect backlink etc.), not errors within redo functions. During crash recovery, a corrupt record means you've reached end of WAL. In standby

[HACKERS] warning message in standby

2010-06-07 Thread Fujii Masao
Hi, When an error is found in the WAL streamed from the master, a warning message is repeated without interval forever in the standby. This consumes CPU load very much, and would interfere with read-only queries. To fix this problem, we should add a sleep into emode_for_corrupt_record() or