Josh Berkus writes:
> On 10/02/2015 09:39 PM, Tom Lane wrote:
>> I wrote:
>>> Here's a rewritten patch that looks at postmaster.pid instead of
>>> pg_control. It should be effectively the same as the prior patch in terms
>>> of response to directory-removal cases, and it should also catch many
>>
On 10/02/2015 09:39 PM, Tom Lane wrote:
> I wrote:
>> Here's a rewritten patch that looks at postmaster.pid instead of
>> pg_control. It should be effectively the same as the prior patch in terms
>> of response to directory-removal cases, and it should also catch many
>> overwrite cases.
>
> BTW,
Michael Paquier writes:
> On Sat, Oct 3, 2015 at 1:39 PM, Tom Lane wrote:
>> BTW, my thought at the moment is to wait till after next week's releases
>> to push this in. I think it's probably solid, but it doesn't seem like
>> it's worth taking the risk of pushing shortly before a wrap date.
>
On Sat, Oct 3, 2015 at 1:39 PM, Tom Lane wrote:
> I wrote:
> > Here's a rewritten patch that looks at postmaster.pid instead of
> > pg_control. It should be effectively the same as the prior patch in
> terms
> > of response to directory-removal cases, and it should also catch many
> > overwrite
I wrote:
> Here's a rewritten patch that looks at postmaster.pid instead of
> pg_control. It should be effectively the same as the prior patch in terms
> of response to directory-removal cases, and it should also catch many
> overwrite cases.
BTW, my thought at the moment is to wait till after ne
I wrote:
> It strikes me that a different approach that might be of value would
> be to re-read postmaster.pid and make sure that (a) it's still there
> and (b) it still contains the current postmaster's PID. This would
> be morally equivalent to what Jim suggests above, and it would dodge
> the c
So, testing:
1. I tested running an AWS instance (Ubuntu 14.04) into 100% IOWAIT, and
the shutdown didn't kick in even when storage went full "d" state. It's
possible that other kinds of remote storage failures would cause a
shutdown, but don't we want them to?
2. I tested deleting /pgdata/* sev
Jim Nasby writes:
> Ouch. So it sounds like there's value to seeing if pg_control isn't what
> we expect it to be.
> Instead of looking at the inode (portability problem), what if
> pg_control contained a random number that was created at initdb time? On
> startup postmaster would read that va
On 09/30/2015 01:18 AM, Michael Paquier wrote:
On Wed, Sep 30, 2015 at 7:19 AM, Tom Lane wrote:
I wrote:
Josh Berkus writes:
Give me source with the change, and I'll put it on a cheap, low-bandwith
AWS instance and hammer the heck out of it. That should raise any false
positives we can ex
On 9/29/15 4:13 PM, Alvaro Herrera wrote:
Joe Conway wrote:
On 09/29/2015 01:48 PM, Alvaro Herrera wrote:
I remember it, but I'm not sure it would have helped you. As I recall,
your trouble was that after a reboot the init script decided to initdb
the mount point -- postmaster wouldn't have
On Wed, Sep 30, 2015 at 7:19 AM, Tom Lane wrote:
> I wrote:
>> Josh Berkus writes:
>>> Give me source with the change, and I'll put it on a cheap, low-bandwith
>>> AWS instance and hammer the heck out of it. That should raise any false
>>> positives we can expect.
>
>> Here's a draft patch again
On 09/29/2015 12:47 PM, Tom Lane wrote:
> Josh Berkus writes:
>> In general, having the postmaster survive deletion of PGDATA is
>> suboptimal. In rare cases of having it survive installation of a new
>> PGDATA (via PITR restore, for example), I've even seen the zombie
>> postmaster corrupt the d
I wrote:
> Josh Berkus writes:
>> Give me source with the change, and I'll put it on a cheap, low-bandwith
>> AWS instance and hammer the heck out of it. That should raise any false
>> positives we can expect.
> Here's a draft patch against HEAD (looks like it will work on 9.5 or
> 9.4 without m
Josh Berkus writes:
> Give me source with the change, and I'll put it on a cheap, low-bandwith
> AWS instance and hammer the heck out of it. That should raise any false
> positives we can expect.
Here's a draft patch against HEAD (looks like it will work on 9.5 or
9.4 without modifications, too)
Joe Conway wrote:
> On 09/29/2015 01:48 PM, Alvaro Herrera wrote:
> > I remember it, but I'm not sure it would have helped you. As I recall,
> > your trouble was that after a reboot the init script decided to initdb
> > the mount point -- postmaster wouldn't have been running at all ...
>
> Righ
On 09/29/2015 01:48 PM, Alvaro Herrera wrote:
> Joe Conway wrote:
>> On 09/29/2015 12:47 PM, Tom Lane wrote:
>>> We could possibly add additional checks, like trying to verify that
>>> pg_control has the same inode number it used to. But I'm afraid that
>>> would add portability issues and false-p
Joe Conway wrote:
> On 09/29/2015 12:47 PM, Tom Lane wrote:
> > We could possibly add additional checks, like trying to verify that
> > pg_control has the same inode number it used to. But I'm afraid that
> > would add portability issues and false-positive hazards that would
> > outweigh the value
On 09/29/2015 12:47 PM, Tom Lane wrote:
> We could possibly add additional checks, like trying to verify that
> pg_control has the same inode number it used to. But I'm afraid that
> would add portability issues and false-positive hazards that would
> outweigh the value.
Not sure you remember the
Tom Lane wrote:
> Testing accessibility of "global/pg_control" would be enough to catch this
> case, but only if we do it before you create a new one. So that seems
> like an argument for making the test relatively often. The once-a-minute
> option is sounding better and better.
If we weren't a
Josh Berkus writes:
> On 09/29/2015 11:48 AM, Tom Lane wrote:
>> But today I thought of another way: suppose that we teach the postmaster
>> to commit hara-kiri if the $PGDATA directory goes away. Since the
>> buildfarm script definitely does remove all the temporary data directories
>> it create
On 09/29/2015 12:18 PM, Tom Lane wrote:
> Andrew Dunstan writes:
>> On 09/29/2015 02:48 PM, Tom Lane wrote:
>>> Also, perhaps we'd only enable this behavior in --enable-cassert builds,
>>> to avoid any risk of a postmaster incorrectly choosing to suicide in a
>>> production scenario. Or maybe tha
Andrew Dunstan writes:
> On 09/29/2015 02:48 PM, Tom Lane wrote:
>> Also, perhaps we'd only enable this behavior in --enable-cassert builds,
>> to avoid any risk of a postmaster incorrectly choosing to suicide in a
>> production scenario. Or maybe that's overly conservative.
> Not every buildfar
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> I wouldn't want to do this every time through the postmaster's main loop,
> >> but we could do this once an hour for no added cost by adding the check
> >> where it does TouchSocketLockFi
On 09/29/2015 02:48 PM, Tom Lane wrote:
A problem the buildfarm has had for a long time is that if for some reason
the scripts fail to stop a test postmaster, the postmaster process will
hang around and cause subsequent runs to fail because of socket conflicts.
This seems to have gotten a lot w
Stephen Frost writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I wouldn't want to do this every time through the postmaster's main loop,
>> but we could do this once an hour for no added cost by adding the check
>> where it does TouchSocketLockFiles; or once every few minutes if we
>> carried a
On 09/29/2015 11:48 AM, Tom Lane wrote:
> But today I thought of another way: suppose that we teach the postmaster
> to commit hara-kiri if the $PGDATA directory goes away. Since the
> buildfarm script definitely does remove all the temporary data directories
> it creates, this ought to get the jo
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> But today I thought of another way: suppose that we teach the postmaster
> to commit hara-kiri if the $PGDATA directory goes away. Since the
> buildfarm script definitely does remove all the temporary data directories
> it creates, this ought to get the job
A problem the buildfarm has had for a long time is that if for some reason
the scripts fail to stop a test postmaster, the postmaster process will
hang around and cause subsequent runs to fail because of socket conflicts.
This seems to have gotten a lot worse lately due to the influx of very
slow b
28 matches
Mail list logo