Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-12-13 Thread Thomas Munro
On Sat, Dec 14, 2019 at 5:32 PM Thomas Munro wrote: > On Sat, Dec 14, 2019 at 5:05 PM Thomas Munro wrote: > > > Pushed. > > > > Build farm not happy... checking... > > Hrmph. FileGetRawDesc() does not contain a call to FileAccess(), so > this is failing on low-fd-limit systems. Looking into a w

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-12-13 Thread Thomas Munro
On Sat, Dec 14, 2019 at 5:05 PM Thomas Munro wrote: > > Pushed. > > Build farm not happy... checking... Hrmph. FileGetRawDesc() does not contain a call to FileAccess(), so this is failing on low-fd-limit systems. Looking into a way to fix that...

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-12-13 Thread Thomas Munro
On Sat, Dec 14, 2019 at 4:49 PM Thomas Munro wrote: > On Fri, Dec 13, 2019 at 5:41 PM Thomas Munro wrote: > > Here's a better version: it uses the existing fd if we have it already > > in md_seg_fds, but opens and closes a transient one if not. > > Pushed. Build farm not happy... checking...

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-12-13 Thread Thomas Munro
On Fri, Dec 13, 2019 at 5:41 PM Thomas Munro wrote: > Here's a better version: it uses the existing fd if we have it already > in md_seg_fds, but opens and closes a transient one if not. Pushed.

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-12-12 Thread Thomas Munro
On Sat, Nov 30, 2019 at 10:57 AM Thomas Munro wrote: > On Fri, Nov 29, 2019 at 12:34 PM Thomas Munro wrote: > > ... or stop using > > _mdfd_getseg() for this so that you can remove segments independently > > without worrying about sync requests for other segments (it was > > actually like that in

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-29 Thread Thomas Munro
On Fri, Nov 29, 2019 at 12:34 PM Thomas Munro wrote: > ... or stop using > _mdfd_getseg() for this so that you can remove segments independently > without worrying about sync requests for other segments (it was > actually like that in an earlier version of the patch for commit > 3eb77eba, but some

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-28 Thread Thomas Munro
On Fri, Nov 29, 2019 at 11:14 AM Justin Pryzby wrote: > On Fri, Nov 29, 2019 at 10:50:36AM +1300, Thomas Munro wrote: > > On Fri, Nov 29, 2019 at 3:13 AM Thomas Munro wrote: > > > On Wed, Nov 27, 2019 at 7:53 PM Justin Pryzby > > > wrote: > > > > 2019-11-26 23:41:50.009-05 | could not fsync fi

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-28 Thread Justin Pryzby
On Fri, Nov 29, 2019 at 10:50:36AM +1300, Thomas Munro wrote: > On Fri, Nov 29, 2019 at 3:13 AM Thomas Munro wrote: > > On Wed, Nov 27, 2019 at 7:53 PM Justin Pryzby wrote: > > > 2019-11-26 23:41:50.009-05 | could not fsync file > > > "pg_tblspc/16401/PG_12_201909212/16460/973123799.10": No suc

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-28 Thread Thomas Munro
On Fri, Nov 29, 2019 at 3:13 AM Thomas Munro wrote: > On Wed, Nov 27, 2019 at 7:53 PM Justin Pryzby wrote: > > 2019-11-26 23:41:50.009-05 | could not fsync file > > "pg_tblspc/16401/PG_12_201909212/16460/973123799.10": No such file or > > directory > > I managed to reproduce this (see below).

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-28 Thread Thomas Munro
On Wed, Nov 27, 2019 at 7:53 PM Justin Pryzby wrote: > 2019-11-26 23:41:50.009-05 | could not fsync file > "pg_tblspc/16401/PG_12_201909212/16460/973123799.10": No such file or > directory I managed to reproduce this (see below). I think I know what the problem is: mdsyncfiletag() uses _mdfd_

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-26 Thread Justin Pryzby
This same crash occured on a 2nd server. Also qemu/KVM, but this time on a 2ndary ZFS tablespaces which (fails to) include the missing relfilenode. Linux database7 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux This is postgresql12-12.1-1PGDG.rhel7.

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-25 Thread Thomas Munro
On Tue, Nov 26, 2019 at 5:21 PM Justin Pryzby wrote: > I looked and found a new "hint". > > On Tue, Nov 19, 2019 at 05:57:59AM -0600, Justin Pryzby wrote: > > < 2019-11-15 22:16:07.098 EST >PANIC: could not fsync file > > "base/16491/1731839470.2": No such file or directory > > < 2019-11-15 22:

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-25 Thread Justin Pryzby
I looked and found a new "hint". On Tue, Nov 19, 2019 at 05:57:59AM -0600, Justin Pryzby wrote: > < 2019-11-15 22:16:07.098 EST >PANIC: could not fsync file > "base/16491/1731839470.2": No such file or directory > < 2019-11-15 22:16:08.751 EST >LOG: checkpointer process (PID 27388) was > ter

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-21 Thread Craig Ringer
On Thu, 21 Nov 2019 at 09:07, Justin Pryzby wrote: > On Tue, Nov 19, 2019 at 07:22:26PM -0600, Justin Pryzby wrote: > > I was trying to reproduce what was happening: > > set -x; psql postgres -txc "DROP TABLE IF EXISTS t" -c "CREATE TABLE t(i > int unique); INSERT INTO t SELECT generate_series(1,

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-20 Thread Justin Pryzby
On Tue, Nov 19, 2019 at 07:22:26PM -0600, Justin Pryzby wrote: > I was trying to reproduce what was happening: > set -x; psql postgres -txc "DROP TABLE IF EXISTS t" -c "CREATE TABLE t(i int > unique); INSERT INTO t SELECT generate_series(1,99)"; echo "begin;SELECT > pg_export_snapshot(); SELE

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-19 Thread Justin Pryzby
On Tue, Nov 19, 2019 at 04:49:10PM -0600, Justin Pryzby wrote: > On Wed, Nov 20, 2019 at 09:26:53AM +1300, Thomas Munro wrote: > > Perhaps we should not panic if we failed to open (not fsync) the file, > > but it's not the root problem here which is that somehow we thought we > > should be fsyncing

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-19 Thread Justin Pryzby
On Wed, Nov 20, 2019 at 09:26:53AM +1300, Thomas Munro wrote: > Perhaps we should not panic if we failed to open (not fsync) the file, > but it's not the root problem here which is that somehow we thought we > should be fsyncing a file that had apparently been removed already > (due to CLUSTER, VAC

Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-19 Thread Thomas Munro
On Wed, Nov 20, 2019 at 12:58 AM Justin Pryzby wrote: > < 2019-11-15 22:16:07.098 EST >PANIC: could not fsync file > "base/16491/1731839470.2": No such file or directory > < 2019-11-15 22:16:08.751 EST >LOG: checkpointer process (PID 27388) was > terminated by signal 6: Aborted > > /dev/vdb

checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-19 Thread Justin Pryzby
I (finally) noticed this morning on a server running PG12.1: < 2019-11-15 22:16:07.098 EST >PANIC: could not fsync file "base/16491/1731839470.2": No such file or directory < 2019-11-15 22:16:08.751 EST >LOG: checkpointer process (PID 27388) was terminated by signal 6: Aborted /dev/vdb on /