On Fri, Jun 28, 2013 at 6:00 PM, Alvaro Herrera
wrote:
> MauMau escribió:
>
> Hi,
>
>> I did this. Please find attached the revised patch. I modified
>> HandleChildCrash(). I tested the immediate shutdown, and the child
>> cleanup succeeded.
>
> Thanks, committed.
>
> There are two matters pend
MauMau escribió:
Hi,
> I did this. Please find attached the revised patch. I modified
> HandleChildCrash(). I tested the immediate shutdown, and the child
> cleanup succeeded.
Thanks, committed.
There are two matters pending here:
1. do we want postmaster to exit immediately after sending t
Hi, Alvaro san,
From: "Alvaro Herrera"
MauMau escribió:
Yeah, I see that --- after removing that early exit, there are unwanted
messages. And in fact there are some signals sent that weren't
previously sent. Clearly we need something here: if we're in immediate
shutdown handler, don't signal
From: "Alvaro Herrera"
Yeah, I see that --- after removing that early exit, there are unwanted
messages. And in fact there are some signals sent that weren't
previously sent. Clearly we need something here: if we're in immediate
shutdown handler, don't signal anyone (because they have already
MauMau escribió:
> From: "Alvaro Herrera"
> >Actually, in further testing I noticed that the fast-path you introduced
> >in BackendCleanup (or was it HandleChildCrash?) in the immediate
> >shutdown case caused postmaster to fail to clean up properly after
> >sending the SIGKILL signal, so I had t
From: "Robert Haas"
On Fri, Jun 21, 2013 at 10:02 PM, MauMau wrote:
I'm comfortable with 5 seconds. We are talking about the interval
between
sending SIGQUIT to the children and then sending SIGKILL to them. In
most
situations, the backends should terminate immediately. However, as I
said
From: "Alvaro Herrera"
MauMau escribió:
I thought of adding some new state of pmState for some reason (that
might be the same as your idea).
But I refrained from doing that, because pmState has already many
states. I was afraid adding a new pmState value for this bug fix
would complicate the s
On Fri, Jun 21, 2013 at 10:02 PM, MauMau wrote:
> I'm comfortable with 5 seconds. We are talking about the interval between
> sending SIGQUIT to the children and then sending SIGKILL to them. In most
> situations, the backends should terminate immediately. However, as I said a
> few months ago,
MauMau escribió:
> Are you suggesting simplifying the following part in ServerLoop()?
> I welcome the idea if this condition becomes simpler. However, I
> cannot imagine how.
> if (AbortStartTime > 0 && /* SIGKILL only once */
> (Shutdown == ImmediateShutdown || (FatalError && !SendStop)) &&
From: "Robert Haas"
On Fri, Jun 21, 2013 at 2:55 PM, Tom Lane wrote:
Robert Haas writes:
More generally, what do we think the point is of sending SIGQUIT
rather than SIGKILL in the first place, and why does that point cease
to be valid after 5 seconds?
Well, mostly it's about telling the c
From: "Robert Haas"
On Thu, Jun 20, 2013 at 12:33 PM, Alvaro Herrera
wrote:
I will go with 5 seconds, then.
I'm uncomfortable with this whole concept, and particularly with such
a short timeout. On a very busy system, things can take a LOT longer
than they think we should; it can take 30 se
The case where I wanted "routine" shutdown immediate (and I'm not sure I
ever actually got it) was when we were using IBM HA/CMP, where I wanted a
"terminate with a fair bit of prejudice".
If we know we want to "switch right away now", immediate seemed pretty much
right. I was fine with interrupt
On Fri, Jun 21, 2013 at 2:55 PM, Tom Lane wrote:
> Robert Haas writes:
>> More generally, what do we think the point is of sending SIGQUIT
>> rather than SIGKILL in the first place, and why does that point cease
>> to be valid after 5 seconds?
>
> Well, mostly it's about telling the client we're
Robert Haas writes:
> More generally, what do we think the point is of sending SIGQUIT
> rather than SIGKILL in the first place, and why does that point cease
> to be valid after 5 seconds?
Well, mostly it's about telling the client we're committing hara-kiri.
Without that, there's no very good r
On Thu, Jun 20, 2013 at 12:33 PM, Alvaro Herrera
wrote:
> I will go with 5 seconds, then.
I'm uncomfortable with this whole concept, and particularly with such
a short timeout. On a very busy system, things can take a LOT longer
than they think we should; it can take 30 seconds or more just to g
From: "Alvaro Herrera"
Actually, I think it would be cleaner to have a new state in pmState,
namely PM_IMMED_SHUTDOWN which is entered when we send SIGQUIT. When
we're in this state, postmaster is only waiting for the timeout to
expire; and when it does, it sends SIGKILL and exits. Pretty much
From: "Alvaro Herrera"
MauMau escribió:
One concern is that umount would fail in such a situation because
postgres has some open files on the filesystem, which is on the
shared disk in case of traditional HA cluster.
See my reply to Noah. If postmaster stays around, would this be any
differ
On Thu, Jun 20, 2013 at 3:40 PM, MauMau wrote:
>
> Here, "reliable" means that the database server is certainly shut
>>> down when pg_ctl returns, not telling a lie that "I shut down the
>>> server processes for you, so you do not have to be worried that some
>>> postgres process might still rema
Actually, I think it would be cleaner to have a new state in pmState,
namely PM_IMMED_SHUTDOWN which is entered when we send SIGQUIT. When
we're in this state, postmaster is only waiting for the timeout to
expire; and when it does, it sends SIGKILL and exits. Pretty much the
same you have, except
MauMau escribió:
> From: "Alvaro Herrera"
> One concern is that umount would fail in such a situation because
> postgres has some open files on the filesystem, which is on the
> shared disk in case of traditional HA cluster.
See my reply to Noah. If postmaster stays around, would this be any
di
From: "Alvaro Herrera"
I will go with 5 seconds, then.
OK, I agree.
My point is that there is no difference. For one thing, once we enter
immediate shutdown state, and sigkill has been sent, no further action
is taken. Postmaster will just sit there indefinitely until processes
are gone.
MauMau escribió:
> First, thank you for the review.
>
> From: "Alvaro Herrera"
> >This seems reasonable. Why 10 seconds? We could wait 5 seconds, or 15.
> >Is there a rationale behind the 10? If we said 60, that would fit
> >perfectly well within the already existing 60-second loop in postmast
First, thank you for the review.
From: "Alvaro Herrera"
This seems reasonable. Why 10 seconds? We could wait 5 seconds, or 15.
Is there a rationale behind the 10? If we said 60, that would fit
perfectly well within the already existing 60-second loop in postmaster,
but that seems way too lon
MauMau escribió:
> Could you review the patch? The summary of the change is:
> 1. postmaster waits for children to terminate when it gets an
> immediate shutdown request, instead of exiting.
>
> 2. postmaster sends SIGKILL to remaining children if all of the
> child processes do not terminate wi
Hello, Tom-san, folks,
From: "Tom Lane"
I think if we want to make it bulletproof we'd have to do what the
OP suggested and switch to SIGKILL. I'm not enamored of that for the
reasons I mentioned --- but one idea that might dodge the disadvantages
is to have the postmaster wait a few seconds a
Andres Freund writes:
> On 2013-02-01 08:55:24 -0500, Peter Eisentraut wrote:
>> I found an old patch that I had prepared for this, which I have
>> attached. YMMV.
>> +static void
>> +quickdie_alarm_handler(SIGNAL_ARGS)
>> +{
>> +/*
>> + * We got here if ereport() was blocking, so don't
On 2013-02-01 08:55:24 -0500, Peter Eisentraut wrote:
> On 1/31/13 5:42 PM, MauMau wrote:
> > Thank you for sharing your experience. So you also considered making
> > postmaster SIGKILL children like me, didn't you? I bet most of people
> > who encounter this problem would feel like that.
> >
>
On 1/31/13 5:42 PM, MauMau wrote:
> Thank you for sharing your experience. So you also considered making
> postmaster SIGKILL children like me, didn't you? I bet most of people
> who encounter this problem would feel like that.
>
> It is definitely pg_ctl who needs to be prepared, not the users.
On 2013-01-22 22:19:25 -0500, Tom Lane wrote:
> Since we've fixed a couple of relatively nasty bugs recently, the core
> committee has determined that it'd be a good idea to push out PG update
> releases soon. The current plan is to wrap on Monday Feb 4 for public
> announcement Thursday Feb 7. I
MauMau wrote:
> Just doing "pkill postgres" will unexpectedly terminate postgres
> of other instances.
Not if you run each instance under a different OS user, and execute
pkill with the right user. (Never use root for that!) This is
just one of the reasons that you should not run multiple clus
From: "Peter Eisentraut"
On 1/30/13 9:11 AM, MauMau wrote:
When I ran "pg_ctl stop -mi" against the primary, some applications
connected to the primary did not stop. The cause was that the backends
was deadlocked in quickdie() with some call stack like the following.
I'm sorry to have left the
On 1/30/13 9:11 AM, MauMau wrote:
> When I ran "pg_ctl stop -mi" against the primary, some applications
> connected to the primary did not stop. The cause was that the backends
> was deadlocked in quickdie() with some call stack like the following.
> I'm sorry to have left the stack trace file on
"MauMau" writes:
> From: "Tom Lane"
>> The long and the short of it is that SIGQUIT is the emergency-stop panic
>> button. You don't use it for routine shutdowns --- you use it when
>> there is a damn good reason to and you're prepared to do some manual
>> cleanup if necessary.
> How about the
From: "Tom Lane"
"MauMau" writes:
I think the solution is the typical one. That is, to just remember the
receipt of SIGQUIT by setting a global variable and call siglongjmp() in
quickdie(), and perform tasks currently done in quickdie() when
sigsetjmp()
returns in PostgresMain().
I think
Andres Freund writes:
> On 2013-01-30 10:23:09 -0500, Tom Lane wrote:
>> Yeah, it's a known hazard that quickdie() operates like that.
> What about not translating those? The messages are static and all memory
> needed by postgres should be pre-allocated.
That would reduce our exposure slightly,
On 2013-01-30 10:23:09 -0500, Tom Lane wrote:
> "MauMau" writes:
> > When I ran "pg_ctl stop -mi" against the primary, some applications
> > connected to the primary did not stop. ...
> > The root cause is that gettext() is called in the signal handler quickdie()
> > via errhint().
>
> Yeah, it
"MauMau" writes:
> When I ran "pg_ctl stop -mi" against the primary, some applications
> connected to the primary did not stop. ...
> The root cause is that gettext() is called in the signal handler quickdie()
> via errhint().
Yeah, it's a known hazard that quickdie() operates like that.
> I t
From: "Tom Lane"
Since we've fixed a couple of relatively nasty bugs recently, the core
committee has determined that it'd be a good idea to push out PG update
releases soon. The current plan is to wrap on Monday Feb 4 for public
announcement Thursday Feb 7. If you're aware of any bug fixes yo
On Sun, Jan 27, 2013 at 11:38 PM, MauMau wrote:
> From: "Fujii Masao"
>>
>> On Sun, Jan 27, 2013 at 12:17 AM, MauMau wrote:
>>>
>>> Although you said the fix will solve my problem, I don't feel it will.
>>> The
>>> discussion is about the crash when the standby "re"starts after the
>>> primary
>
From: "Fujii Masao"
On Sun, Jan 27, 2013 at 12:17 AM, MauMau wrote:
Although you said the fix will solve my problem, I don't feel it will.
The
discussion is about the crash when the standby "re"starts after the
primary
vacuums and truncates a table. On the other hand, in my case, the
standb
On Sun, Jan 27, 2013 at 12:17 AM, MauMau wrote:
> From: "Fujii Masao"
>>
>> On Thu, Jan 24, 2013 at 11:53 PM, MauMau wrote:
>>>
>>> I'm wondering if the fix discussed in the above thread solves my problem.
>>> I
>>> found the following differences between Horiguchi-san's case and my case:
>>>
>>
From: "Fujii Masao"
On Thu, Jan 24, 2013 at 11:53 PM, MauMau wrote:
I'm wondering if the fix discussed in the above thread solves my problem.
I
found the following differences between Horiguchi-san's case and my case:
(1)
Horiguchi-san says the bug outputs the message:
WARNING: page 0 of r
On Thu, Jan 24, 2013 at 11:53 PM, MauMau wrote:
> From: "Fujii Masao"
>>
>> On Thu, Jan 24, 2013 at 7:42 AM, MauMau wrote:
>>>
>>> I searched through PostgreSQL mailing lists with "WAL contains references
>>> to
>>> invalid pages", and i found 19 messages. Some people encountered similar
>>> pr
From: "Fujii Masao"
On Thu, Jan 24, 2013 at 7:42 AM, MauMau wrote:
I searched through PostgreSQL mailing lists with "WAL contains references
to
invalid pages", and i found 19 messages. Some people encountered similar
problem. There were some discussions regarding those problems (Tom and
Sim
On Thu, Jan 24, 2013 at 7:42 AM, MauMau wrote:
> From: "Tom Lane"
>
>> Since we've fixed a couple of relatively nasty bugs recently, the core
>> committee has determined that it'd be a good idea to push out PG update
>> releases soon. The current plan is to wrap on Monday Feb 4 for public
>> ann
From: "Tom Lane"
Since we've fixed a couple of relatively nasty bugs recently, the core
committee has determined that it'd be a good idea to push out PG update
releases soon. The current plan is to wrap on Monday Feb 4 for public
announcement Thursday Feb 7. If you're aware of any bug fixes yo
Stephen Frost writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Since we've fixed a couple of relatively nasty bugs recently, the core
>> committee has determined that it'd be a good idea to push out PG update
>> releases soon. The current plan is to wrap on Monday Feb 4 for public
>> announcem
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Since we've fixed a couple of relatively nasty bugs recently, the core
> committee has determined that it'd be a good idea to push out PG update
> releases soon. The current plan is to wrap on Monday Feb 4 for public
> announcement Thursday Feb 7. If you'r
Since we've fixed a couple of relatively nasty bugs recently, the core
committee has determined that it'd be a good idea to push out PG update
releases soon. The current plan is to wrap on Monday Feb 4 for public
announcement Thursday Feb 7. If you're aware of any bug fixes you think
ought to get
49 matches
Mail list logo