Re: [HACKERS] Archiver not exiting upon crash

2012-05-25 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes: So my test harness is an inexplicably effective show-case for the vulnerability, but it is not the reason the vulnerability should be fixed. I spent a bit of time looking into this. In principle the postmaster could be fixed to repeat the SIGQUIT signal

Re: [HACKERS] Archiver not exiting upon crash

2012-05-24 Thread Jeff Janes
On Wed, May 23, 2012 at 2:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: Jeff Janes jeff.ja...@gmail.com writes: But what happens if the SIGQUIT is blocked before the system(3) is invoked?  Does the ignore take precedence over the block, or does the block take precedence over the ignore,

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Peter Eisentraut
On mån, 2012-05-21 at 12:52 -0400, Tom Lane wrote: I see Peter's commit d6de43099ac0bddb4b1da40088487616da892164 only touched postgres.c's quickdie(), and not all the *other* background processes with identical coding. That seems a clear oversight, so I will go fix it. None[*] of the other

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Peter Eisentraut
On mån, 2012-05-21 at 13:14 -0400, Tom Lane wrote: ... but having said that, I see Peter's commit d6de43099ac0bddb4b1da40088487616da892164 only touched postgres.c's quickdie(), and not all the *other* background processes with identical coding. That seems a clear oversight, so I will go

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes: On mån, 2012-05-21 at 13:14 -0400, Tom Lane wrote: ... wait, scratch that. AFAICS, that commit was totally useless, because BlockSig should always already contain SIGQUIT. No, because PostgresMain() deletes it from BlockSig. Ah. So potentially we

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Fujii Masao
On Thu, May 24, 2012 at 1:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On mån, 2012-05-21 at 13:14 -0400, Tom Lane wrote: ... wait, scratch that.  AFAICS, that commit was totally useless, because BlockSig should always already contain SIGQUIT. No,

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Jeff Janes
On Mon, May 21, 2012 at 9:22 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, May 19, 2012 at 1:23 AM, Jeff Janes jeff.ja...@gmail.com wrote: I've been testing the crash recovery of REL9_2_BETA1, using the same method I posted in the Scaling XLog insertion thread.  I have the checkpointer

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes: It looks to me like the SIGQUIT from the postmaster is simply getting lost. And from what little I understand of signal handling, this is a known race with system(3). The archive_command, child of archiver, exits before it can receive the signal sent

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Tom Lane
I wrote: On my machine, man system(3) saith: system() ignores the SIGINT and SIGQUIT signals, and blocks the SIGCHLD signal, while waiting for the command to terminate. If this might cause the application to miss a signal that would have killed it, the application should

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Jeff Janes
On Wed, May 23, 2012 at 1:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Jeff Janes jeff.ja...@gmail.com writes: It looks to me like the SIGQUIT from the postmaster is simply getting lost.  And from what little I understand of signal handling, this is a known race with system(3).  The

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes: On Wed, May 23, 2012 at 1:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: On my machine, man system(3) saith: system() ignores the SIGINT and SIGQUIT signals, and blocks the SIGCHLD signal, while waiting for the command to terminate. If this

Re: [HACKERS] Archiver not exiting upon crash

2012-05-23 Thread Tom Lane
I wrote: Jeff Janes jeff.ja...@gmail.com writes: But what happens if the SIGQUIT is blocked before the system(3) is invoked? Does the ignore take precedence over the block, or does the block take precedence over the ignore, and so the signal is still waiting once the block is reversed after

Re: [HACKERS] Archiver not exiting upon crash

2012-05-21 Thread Fujii Masao
On Sat, May 19, 2012 at 1:23 AM, Jeff Janes jeff.ja...@gmail.com wrote: I've been testing the crash recovery of REL9_2_BETA1, using the same method I posted in the Scaling XLog insertion thread.  I have the checkpointer occasionally throw a FATAL error, We should also fix this problem? If yes,

Re: [HACKERS] Archiver not exiting upon crash

2012-05-21 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes: ... sometimes the automatic recovery never initiates. It looks like the postmaster is waiting for the archiver to exit before it starts recovery, and the archiver is waiting for something, I don't really know what. Can you try poking into the

Re: [HACKERS] Archiver not exiting upon crash

2012-05-21 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes: You might have gotten the following problem which was discussed before. This problem was fixed in SIGQUIT signal handler of a backend, but ISTM not that of an archiver. http://archives.postgresql.org/pgsql-admin/2009-11/msg00088.php pgarch.c's SIGQUIT

Re: [HACKERS] Archiver not exiting upon crash

2012-05-21 Thread Tom Lane
I wrote: Fujii Masao masao.fu...@gmail.com writes: You might have gotten the following problem which was discussed before. This problem was fixed in SIGQUIT signal handler of a backend, but ISTM not that of an archiver. http://archives.postgresql.org/pgsql-admin/2009-11/msg00088.php

Re: [HACKERS] Archiver not exiting upon crash

2012-05-21 Thread Tom Lane
I wrote: ... but having said that, I see Peter's commit d6de43099ac0bddb4b1da40088487616da892164 only touched postgres.c's quickdie(), and not all the *other* background processes with identical coding. That seems a clear oversight, so I will go fix it. Doesn't explain why the archiver

[HACKERS] Archiver not exiting upon crash

2012-05-18 Thread Jeff Janes
I've been testing the crash recovery of REL9_2_BETA1, using the same method I posted in the Scaling XLog insertion thread. I have the checkpointer occasionally throw a FATAL error, which causes the postmaster to take down all of the other processes (DETAIL: The postmaster has commanded this