Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-24 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Yeah, I added that recently to try to detect postmaster children > >> that exit without cleaning up properly. I seem to have missed this > >> error case :-(. Actually it looks like fork failure for regular > >> backends gets it wr

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-24 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> Yeah, I added that recently to try to detect postmaster children >> that exit without cleaning up properly. I seem to have missed this >> error case :-(. Actually it looks like fork failure for regular >> backends gets it wrong too :-( :-( --- would yo

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-24 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > Zdenek Kotala wrote: > >> Just a confirmation that Alvaro's patch+ReleasePostmasterChildSlot() fix > >> solves the problem and PostgreSQL survives well during a memory > >> shortages. > > > So this patch would do it. > > Looks good to me, but I think

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-24 Thread Tom Lane
Alvaro Herrera writes: > Zdenek Kotala wrote: >> Just a confirmation that Alvaro's patch+ReleasePostmasterChildSlot() fix >> solves the problem and PostgreSQL survives well during a memory >> shortages. > So this patch would do it. Looks good to me, but I think you should also increase the avlau

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-24 Thread Alvaro Herrera
Zdenek Kotala wrote: > Just a confirmation that Alvaro's patch+ReleasePostmasterChildSlot() fix > solves the problem and PostgreSQL survives well during a memory > shortages. So this patch would do it. I think this stuff about postmaster child slots is later than launcher/worker split. I don't

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-24 Thread Tom Lane
Zdenek Kotala writes: > ... We can see there that AVlauncher really wait 100ms, but it is not enough > when system is under stress. OK, thanks for checking that. > I think that Alvaro's patch is good and it fix a crash problem. I also > think that AVlauncher could wait little bit more then 100ms

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-24 Thread Zdenek Kotala
Zdenek Kotala píše v po 24. 08. 2009 v 13:47 +0200: > I tested Alvaro's patch and it works, because it does not lead to stack > consumption, but it shows another bug in StartAutovacuumWorker() code. > When fork fails bn structure is freed but > ReleasePostmasterChildSlot() should be called as we

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-24 Thread Zdenek Kotala
Tom Lane píše v so 22. 08. 2009 v 09:56 -0400: > Zdenek Kotala writes: > > There are most important records from yesterdays issues. > > Messages: > > - > > Aug 20 11:14:54 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap > > space to grow stack for pid 507 (postgres) > > Hmm,

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-22 Thread Tom Lane
Zdenek Kotala writes: > There are most important records from yesterdays issues. > Messages: > - > Aug 20 11:14:54 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap > space to grow stack for pid 507 (postgres) Hmm, that seems to confirm the idea that something had run the machin

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-22 Thread Zdenek Kotala
Tom Lane píše v pá 21. 08. 2009 v 18:06 -0400: > Maybe, but I think we need to understand exactly what happened first. I try to mine more data from the system to reconstruct what happen. Unfortunately, default postgresql log configuration does not have timestamp. The postgresql had no load, syst

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-22 Thread Zdenek Kotala
Alvaro Herrera píše v pá 21. 08. 2009 v 17:48 -0400: > Tom Lane wrote: > > > I'd still like to have some fork-rate-limiting behavior in there > > somewhere. However, it might make sense for the avlauncher to do that > > rather than the postmaster. Does that idea seem more implementable? > >

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-21 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> I'd still like to have some fork-rate-limiting behavior in there >> somewhere. However, it might make sense for the avlauncher to do that >> rather than the postmaster. Does that idea seem more implementable? > Well, there's already rate limiting in t

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-21 Thread Alvaro Herrera
Tom Lane wrote: > I'd still like to have some fork-rate-limiting behavior in there > somewhere. However, it might make sense for the avlauncher to do that > rather than the postmaster. Does that idea seem more implementable? Well, there's already rate limiting in the launcher:

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-21 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> It does seem that we ought to change things so that there's a bit more >> delay before trying to re-launch a failed autovac worker, though. >> Whatever caused this was effectively turning the autovac logic into >> a fork-bomb engine. I'm not thinking of

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-21 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > If sigusr1_handler needs rewriting, don't all the other sighandler as > > well? > > It does not, and neither do they. I'm not sure what happened here but > it wasn't the fault of the postmaster's organization of signal handlers. > > It does seem that

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-21 Thread Tom Lane
Alvaro Herrera writes: > If sigusr1_handler needs rewriting, don't all the other sighandler as > well? It does not, and neither do they. I'm not sure what happened here but it wasn't the fault of the postmaster's organization of signal handlers. It does seem that we ought to change things so th

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-21 Thread Zdenek Kotala
Alvaro Herrera píše v pá 21. 08. 2009 v 15:40 -0400: > Zdenek Kotala wrote: > > > The problem what I see here is that StartAutovacuumWorker() fails and > > send SIGUSR1 to the postmaster, but it send it too quickly and signal > > handler is still active. When signal mask is unblocked in > > sigus

Re: [HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-21 Thread Alvaro Herrera
Zdenek Kotala wrote: > The problem what I see here is that StartAutovacuumWorker() fails and > send SIGUSR1 to the postmaster, but it send it too quickly and signal > handler is still active. When signal mask is unblocked in > sigusr1_handler() than signal handler is run again... > > The reason w

[HACKERS] SIGUSR1 pingpong between master na autovacum launcher causes crash

2009-08-21 Thread Zdenek Kotala
I found following core file of PG 8.4.0 on my system (Solaris Nevada b119): fe8ae42d _dowrite (85bf6e8, 3a, 8035e3c, 80350e8) + 8d fe8ae743 _ndoprnt (85bf6e8, 8035ec8, 8035e3c, 0) + 2ba fe8b322d vsnprintf (85bfaf0, 3ff, 85bf6e8, 8035ec8, 0, 0) + 65 082194ea appendStringInfoVA (8035e9c, 85bf6e8