Re: Windows buildfarm members vs. new async-notify isolation test

2020-08-29 Thread Tom Lane
Thomas Munro writes: > I made a thing to watch out for low probability BF failures and it > told me that a similar failure in async-notify might have reappeared > on brolga: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=brolga&dt=2020-07-15%2008:30:11 > | REL_10_STABLE > [ etc ]

Re: Windows buildfarm members vs. new async-notify isolation test

2020-08-29 Thread Thomas Munro
On Thu, Dec 12, 2019 at 9:11 AM Tom Lane wrote: > Amit Kapila writes: > > I am convinced by your points. So +1 for your proposed patch. I have > > already reviewed it yesterday and it appears fine to me. > > OK, pushed. Thanks for reviewing! I made a thing to watch out for low probability BF

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-11 Thread Tom Lane
Amit Kapila writes: > I am convinced by your points. So +1 for your proposed patch. I have > already reviewed it yesterday and it appears fine to me. OK, pushed. Thanks for reviewing! regards, tom lane

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-10 Thread Amit Kapila
On Tue, Dec 10, 2019 at 9:27 PM Tom Lane wrote: > > Amit Kapila writes: > > On Sun, Dec 8, 2019 at 10:27 PM Tom Lane wrote: > >> Doing it like this seems attractive to me because it gets rid of two > >> different failure modes: inability to create a new thread and inability > >> to create a new

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-10 Thread Tom Lane
Amit Kapila writes: > On Sun, Dec 8, 2019 at 10:27 PM Tom Lane wrote: >> Doing it like this seems attractive to me because it gets rid of two >> different failure modes: inability to create a new thread and inability >> to create a new pipe handle. Now on the other hand, it means that >> inabili

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-09 Thread Amit Kapila
On Sun, Dec 8, 2019 at 10:27 PM Tom Lane wrote: > > I wrote: > > So, just idly looking at the code in src/backend/port/win32/signal.c > > and src/port/kill.c, I have to wonder why we have this baroque-looking > > design of using *two* signal management threads. And, if I'm > > reading it right, w

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-09 Thread Tom Lane
Andrew Dunstan writes: > Patch 1 fixed the problems on frogmouth. Cool, thanks. I'll push that in a bit (to the back branches as well as HEAD). > Patch 2 also ran without incident. What do people think about the second patch? I'd only propose that for HEAD, since it's not really a bug fix.

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-09 Thread Andrew Dunstan
On 12/8/19 11:57 AM, Tom Lane wrote: > I wrote: >> So, just idly looking at the code in src/backend/port/win32/signal.c >> and src/port/kill.c, I have to wonder why we have this baroque-looking >> design of using *two* signal management threads. And, if I'm >> reading it right, we create an enti

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-08 Thread Tom Lane
I wrote: > So, just idly looking at the code in src/backend/port/win32/signal.c > and src/port/kill.c, I have to wonder why we have this baroque-looking > design of using *two* signal management threads. And, if I'm > reading it right, we create an entire new pipe object and an entire > new instan

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-08 Thread Tom Lane
I wrote: > Amit Kapila writes: >> IIUC, once the dispatch thread has queued the signal >> (pg_queue_signal), the next CHECK_FOR_INTERRUPTS by the main thread >> will execute the signal. So, if we move pg_queue_signal() before we >> do WriteFile in pg_signal_dispatch_thread(), this race condition

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-08 Thread Tom Lane
Amit Kapila writes: > IIUC, once the dispatch thread has queued the signal > (pg_queue_signal), the next CHECK_FOR_INTERRUPTS by the main thread > will execute the signal. So, if we move pg_queue_signal() before we > do WriteFile in pg_signal_dispatch_thread(), this race condition will > be close

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-08 Thread Amit Kapila
On Sun, Dec 8, 2019 at 10:44 AM Amit Kapila wrote: > > On Sun, Dec 8, 2019 at 1:26 AM Tom Lane wrote: > > > > So, just idly looking at the code in src/backend/port/win32/signal.c > > and src/port/kill.c, I have to wonder why we have this baroque-looking > > design of using *two* signal management

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-07 Thread Amit Kapila
On Sat, Dec 7, 2019 at 10:50 PM Tom Lane wrote: > > Amit Kapila writes: > > On Sat, Dec 7, 2019 at 5:01 AM Tom Lane wrote: > >> A possible theory as to what's happening is that the kernel scheduler > >> is discriminating against listener2's signal management thread(s) > >> and not running them u

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-07 Thread Amit Kapila
On Sun, Dec 8, 2019 at 1:26 AM Tom Lane wrote: > > So, just idly looking at the code in src/backend/port/win32/signal.c > and src/port/kill.c, I have to wonder why we have this baroque-looking > design of using *two* signal management threads. And, if I'm > reading it right, we create an entire n

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-07 Thread Tom Lane
I wrote: > Amit Kapila writes: >> On Sat, Dec 7, 2019 at 5:01 AM Tom Lane wrote: >>> A possible theory as to what's happening is that the kernel scheduler >>> is discriminating against listener2's signal management thread(s) >>> and not running them until everything else goes idle for a moment.

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-07 Thread Tom Lane
Amit Kapila writes: > On Sat, Dec 7, 2019 at 5:01 AM Tom Lane wrote: >> A possible theory as to what's happening is that the kernel scheduler >> is discriminating against listener2's signal management thread(s) >> and not running them until everything else goes idle for a moment. > If we have to

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-06 Thread Amit Kapila
On Sat, Dec 7, 2019 at 5:01 AM Tom Lane wrote: > > Andrew Dunstan writes: > > On 12/5/19 4:37 AM, Amit Kapila wrote: > >> IIUC, this means that commit (step l2commit) is finishing before the > >> notify signal is reached that session. If so, can we at least confirm > >> that by adding something

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-06 Thread Tom Lane
Andrew Dunstan writes: > On 12/5/19 4:37 AM, Amit Kapila wrote: >> IIUC, this means that commit (step l2commit) is finishing before the >> notify signal is reached that session. If so, can we at least confirm >> that by adding something like select pg_sleep(1) in that step? So, >> l2commit will

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-06 Thread Andrew Dunstan
On 12/5/19 4:37 AM, Amit Kapila wrote: > On Wed, Dec 4, 2019 at 9:51 PM Andrew Dunstan > wrote: >> On Wed, Dec 4, 2019 at 12:12 AM Tom Lane wrote: >>> Amit Kapila writes: On Tue, Dec 3, 2019 at 10:10 PM Tom Lane wrote: > Hmm ... just looking at the code again, could it be that there'

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-05 Thread Amit Kapila
On Wed, Dec 4, 2019 at 9:51 PM Andrew Dunstan wrote: > > On Wed, Dec 4, 2019 at 12:12 AM Tom Lane wrote: > > > > Amit Kapila writes: > > > On Tue, Dec 3, 2019 at 10:10 PM Tom Lane wrote: > > >> Hmm ... just looking at the code again, could it be that there's > > >> no well-placed CHECK_FOR_INTE

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-04 Thread Andrew Dunstan
On Wed, Dec 4, 2019 at 12:12 AM Tom Lane wrote: > > Amit Kapila writes: > > On Tue, Dec 3, 2019 at 10:10 PM Tom Lane wrote: > >> Hmm ... just looking at the code again, could it be that there's > >> no well-placed CHECK_FOR_INTERRUPTS? Andrew, could you see if > >> injecting one in what 7900269

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-03 Thread Tom Lane
Amit Kapila writes: > On Tue, Dec 3, 2019 at 10:10 PM Tom Lane wrote: >> Hmm ... just looking at the code again, could it be that there's >> no well-placed CHECK_FOR_INTERRUPTS? Andrew, could you see if >> injecting one in what 790026972 added to postgres.c helps? > I also tried to analyze this

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-03 Thread Amit Kapila
On Tue, Dec 3, 2019 at 10:10 PM Tom Lane wrote: > > In principle, the issue should not be there, because commits > 790026972 et al should have ensured that the NOTIFY protocol > message comes out before ReadyForQuery (and thus, libpq will > absorb it before PQgetResult will return NULL). I think

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-03 Thread Tom Lane
Mark Dilger writes: > On 12/2/19 11:42 AM, Andrew Dunstan wrote: >> On 12/2/19 11:23 AM, Tom Lane wrote: >>> I'm a little baffled as to what this might be --- some sort of >>> timing problem in our Windows signal emulation, perhaps? But >>> if so, why haven't we found it years ago? > I would be

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-03 Thread Mark Dilger
On 12/2/19 11:42 AM, Andrew Dunstan wrote: On 12/2/19 11:23 AM, Tom Lane wrote: I see from the buildfarm status page that since commits 6b802cfc7 et al went in a week ago, frogmouth and currawong have failed that new test case every time, with the symptom == pgsql.build/src/

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-02 Thread Andrew Dunstan
On 12/2/19 11:23 AM, Tom Lane wrote: > I see from the buildfarm status page that since commits 6b802cfc7 > et al went in a week ago, frogmouth and currawong have failed that > new test case every time, with the symptom > > == pgsql.build/src/test/isolation/regression.diffs >

Windows buildfarm members vs. new async-notify isolation test

2019-12-02 Thread Tom Lane
I see from the buildfarm status page that since commits 6b802cfc7 et al went in a week ago, frogmouth and currawong have failed that new test case every time, with the symptom == pgsql.build/src/test/isolation/regression.diffs === *** c:/prog/bf/root/REL_10_STABLE