Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-27 Thread Tom Lane
Andres Freund writes: > On 2023-02-27 12:42:00 -0500, Tom Lane wrote: >> I went ahead and coded it that way, and it doesn't look too awful. >> Any objections? > Looks good to me. > I think it'd be an indication of a bug around the invalidation handling if the > terminations were required. So

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-27 Thread Andres Freund
Hi, On 2023-02-27 12:42:00 -0500, Tom Lane wrote: > I wrote: > > Hah - I thought of a solution. We can avoid this race condition if > > we make the remote session itself inspect pg_stat_activity and > > return its displayed application_name. Just need a foreign table > > that maps onto

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-27 Thread Tom Lane
I wrote: > Hah - I thought of a solution. We can avoid this race condition if > we make the remote session itself inspect pg_stat_activity and > return its displayed application_name. Just need a foreign table > that maps onto pg_stat_activity. I went ahead and coded it that way, and it doesn't

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-27 Thread Tom Lane
I wrote: > ... maybe we could do "select 1 from > pg_stat_activity where application_name = computed-pattern", but that > has the same problem that a cache flush might have terminated the > remote session. Hah - I thought of a solution. We can avoid this race condition if we make the remote

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Tom Lane
Andres Freund writes: > Not that I understand why that tries to terminate connections, instead of just > looking at application name. The test is trying to verify the application name reported by the "remote" session, which isn't constant, so we can't just do "select application_name from

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Andres Freund
Hi, On 2023-02-26 15:57:01 -0500, Tom Lane wrote: > However, the other stanza with debug_discard_caches muckery is the > one about "test postgres_fdw.application_name GUC", and in that case > ignoring the number of terminated connections would destroy the > point of the test entirely; because

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Tom Lane
I wrote: > I'm inclined to think we should indeed just nuke that test. It's > overcomplicated and it expends a lot of test cycles on a pretty > marginal feature. Perhaps a better idea: at the start of the test, set postgres_fdw.application_name to something that exercises all the available

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Tom Lane
Andres Freund writes: > Hm, yea, that should work. It's indeed the entirety of the diff > https://api.cirrus-ci.com/v1/artifact/task/4718859714822144/testrun/build/testrun/postgres_fdw-running/regress/regression.diffs > If we go that way we can remove the debug_discard muckery as well, I think?

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Tom Lane
Andres Freund writes: > On 2023-02-26 14:51:45 -0500, Tom Lane wrote: >> If that's the only diff, we could just hide it, say by writing > Hm, yea, that should work. It's indeed the entirety of the diff >

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Andres Freund
Hi, On 2023-02-26 14:51:45 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2022-12-08 16:15:11 -0800, Andres Freund wrote: > >> The most frequent case is postgres_fdw, which somewhat regularly fails > >> with a > >> regression.diff like this: > >> WHERE application_name =

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Tom Lane
Andres Freund writes: > On 2022-12-08 16:15:11 -0800, Andres Freund wrote: >> The most frequent case is postgres_fdw, which somewhat regularly fails with a >> regression.diff like this: >> WHERE application_name = 'fdw_retry_check'; >> pg_terminate_backend >> -- >> - t >> -(1

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-26 Thread Andres Freund
Hi, On 2022-12-08 16:15:11 -0800, Andres Freund wrote: > The most frequent case is postgres_fdw, which somewhat regularly fails with a > regression.diff like this: > > diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out >

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-09 Thread Peter Geoghegan
On Wed, Feb 8, 2023 at 7:18 PM Andres Freund wrote: > > This is a good thing for performance, of course, but it also makes VACUUM > > VERBOSE show information that makes sense to users, since things actually > > happen in a way that makes a lot more sense. I'm quite happy about the fact > > that

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-08 Thread Andres Freund
Hi, On 2023-02-08 18:37:41 -0800, Peter Geoghegan wrote: > On Wed, Feb 8, 2023 at 4:29 PM Andres Freund wrote: > > 2) Add a message to lazy_vacuum() or lazy_vacuum_all_indexes(), that > > includes > >- num_index_scans > >- how many indexes we'll scan > >- how many dead tids we're

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-08 Thread Peter Geoghegan
On Wed, Feb 8, 2023 at 4:29 PM Andres Freund wrote: > I find it useful information when debugging problems. Without it, the log > doesn't tell you which index was processed when a problem started to occur. Or > even that we were scanning indexes at all. I guess it might have some limited value

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-08 Thread Andres Freund
Hi, On 2023-02-08 14:03:49 -0800, Peter Geoghegan wrote: > On Tue, Feb 7, 2023 at 6:47 PM Andres Freund wrote: > > One thing I'm not quite sure what to do about is that we atm use a hardcoded > > DEBUG2 (not controlled by VERBOSE) in a bunch of places: > > > > ereport(DEBUG2, > >

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-08 Thread Peter Geoghegan
On Tue, Feb 7, 2023 at 6:47 PM Andres Freund wrote: > One thing I'm not quite sure what to do about is that we atm use a hardcoded > DEBUG2 (not controlled by VERBOSE) in a bunch of places: > > ereport(DEBUG2, > (errmsg("table \"%s\": removed %lld dead item >

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-07 Thread Andres Freund
Hi, On 2023-02-06 17:53:00 -0800, Andres Freund wrote: > Another run hit an issue we've been fighting repeatedly on the buildfarm / CI: > https://cirrus-ci.com/task/5527490404286464 >

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-06 Thread Andres Freund
Hi, On 2023-02-06 19:29:46 -0800, Andres Freund wrote: > There's something off. Isolationtester's control connection emits *loads* of > invalidation messages: > 2023-02-06 19:29:06.430 PST [2125297][client > backend][6/0:121864][isolation/receipt-report/control connection] LOG: > previously

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-06 Thread Andres Freund
Hi, On 2023-02-06 17:53:00 -0800, Andres Freund wrote: > WRT the fdw_retry_check: I wonder if we should increase the log level of > a) pgfdw_inval_callback deciding to disconnect > b) ReceiveSharedInvalidMessages() deciding to reset > > to DEBUG1, at least temporarily? > > Alternatively we could

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2023-02-06 Thread Andres Freund
Hi, On 2022-12-08 16:36:07 -0800, Andres Freund wrote: > On 2022-12-08 16:15:11 -0800, Andres Freund wrote: > > Unfortunately cfbot shows that that doesn't work entirely reliably. > > > > The most frequent case is postgres_fdw, which somewhat regularly fails with > > a > > regression.diff like

Re: tests against running server occasionally fail, postgres_fdw & tenk1

2022-12-08 Thread Andres Freund
Hi, On 2022-12-08 16:15:11 -0800, Andres Freund wrote: > commit 3f0e786ccbf > Author: Andres Freund > Date: 2022-12-07 12:13:35 -0800 > > meson: Add 'running' test setup, as a replacement for installcheck > > CI tests the pg_regress/isolationtester tests that support doing so against a >

tests against running server occasionally fail, postgres_fdw & tenk1

2022-12-08 Thread Andres Freund
Hi, Since commit 3f0e786ccbf Author: Andres Freund Date: 2022-12-07 12:13:35 -0800 meson: Add 'running' test setup, as a replacement for installcheck CI tests the pg_regress/isolationtester tests that support doing so against a running server. Unfortunately cfbot shows that that