Re: Checkpoint not retrying failed fsync?

2018-07-12 Thread Thomas Munro
On Tue, Jun 12, 2018 at 3:31 PM, Thomas Munro wrote: > I was about to mark this patch "rejected" and forget about it, since > Craig's patch makes it redundant. But then I noticed that Craig's > patch doesn't actually remove the retry behaviour completely: it > promotes only EIO and ENOSPC to

Re: Checkpoint not retrying failed fsync?

2018-06-11 Thread Thomas Munro
On Sun, Apr 8, 2018 at 9:17 PM, Thomas Munro wrote: > New patch attached. For Linux users (read: almost all users) this patch on its own is a bit like rearranging the deck chairs on the Titanic. Because retrying on failure is useless, among other problems, Craig and Andres are working on

Re: Checkpoint not retrying failed fsync?

2018-04-08 Thread Thomas Munro
On Sun, Apr 8, 2018 at 5:36 PM, Amit Kapila wrote: > Won't in the success case, you need to delete each member (by > something like bms_del_member) rather than just using bms_free? Thanks for looking at this. Yeah, if requests for segment numbers 0 and 1 were in

Re: Checkpoint not retrying failed fsync?

2018-04-07 Thread Amit Kapila
On Fri, Apr 6, 2018 at 6:26 AM, Thomas Munro wrote: > On Fri, Apr 6, 2018 at 11:36 AM, Thomas Munro > wrote: >> On Fri, Apr 6, 2018 at 11:34 AM, Andrew Gierth >> wrote: >>> Right. >>> >>> But I don't

Re: Checkpoint not retrying failed fsync?

2018-04-05 Thread Thomas Munro
On Fri, Apr 6, 2018 at 12:56 PM, Thomas Munro wrote: > After some testing, here is a better one for review. One problem I thought of about 8 milliseconds after clicking send is that bms_union() may fail to allocate memory and then you're hosed. Here is a new

Re: Checkpoint not retrying failed fsync?

2018-04-05 Thread Thomas Munro
On Fri, Apr 6, 2018 at 11:36 AM, Thomas Munro wrote: > On Fri, Apr 6, 2018 at 11:34 AM, Andrew Gierth > wrote: >> Right. >> >> But I don't think just copying the value is sufficient; if a new bit was >> set while we were processing the

Re: Checkpoint not retrying failed fsync?

2018-04-05 Thread Thomas Munro
On Fri, Apr 6, 2018 at 11:34 AM, Andrew Gierth wrote: >> "Thomas" == Thomas Munro writes: > > >> As far as I can tell from reading the code, if a checkpoint fails the > >> checkpointer is supposed to keep all the outstanding fsync

Re: Checkpoint not retrying failed fsync?

2018-04-05 Thread Andrew Gierth
> "Thomas" == Thomas Munro writes: >> As far as I can tell from reading the code, if a checkpoint fails the >> checkpointer is supposed to keep all the outstanding fsync requests for >> next time. Am I wrong, or is there some failure in the logic to do this?

Re: Checkpoint not retrying failed fsync?

2018-04-05 Thread Thomas Munro
On Fri, Apr 6, 2018 at 10:16 AM, Andrew Gierth wrote: > Furthermore, checking the trace output from the checkpointer process, it > is not even attempting an fsync of the failing file; this isn't like the > Linux fsync issue, I've confirmed that fsync will repeatedly

Checkpoint not retrying failed fsync?

2018-04-05 Thread Andrew Gierth
This is only a preliminary report, I'm still trying to analyze what's going on, but: In doing testing on FreeBSD with a filesystem set up to induce errors controllably (using gconcat+gnop), I can get this to happen (on 11devel): (note that "mytable" is on a tablespace on the erroring filesystem,