Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Tom Lane
Andres Freund writes: > On 2017-04-27 16:35:29 -0400, Tom Lane wrote: >> It looks like it might be sufficient to do "#ifdef EPOLL_CLOEXEC" >> in latch.c, rather than bothering with a full-blown configure check. > Yea, that sounds worth trying. Wonder if we need to care about

Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Andres Freund
On 2017-04-27 16:35:29 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-04-26 17:05:39 -0400, Tom Lane wrote: > >> I went ahead and changed the call to epoll_create into epoll_create1. > >> I'm not too concerned about loss of portability there --- it seems > >>

Re: [HACKERS] Unportable implementation of background worker start

2017-04-27 Thread Tom Lane
Andres Freund writes: > On 2017-04-26 17:05:39 -0400, Tom Lane wrote: >> I went ahead and changed the call to epoll_create into epoll_create1. >> I'm not too concerned about loss of portability there --- it seems >> unlikely that many people are still using ten-year-old glibc,

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Tom Lane
Robert Haas writes: > On Wed, Apr 26, 2017 at 8:37 PM, Andres Freund wrote: >>> 3. Go ahead with converting the postmaster to use WaitEventSet, a la >>> the draft patch I posted earlier. I'd be happy to do this if we were >>> at the start of a devel

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Andres Freund
On 2017-04-26 17:05:39 -0400, Tom Lane wrote: > Here's an updated version of that, which makes use of our previous > conclusion that F_SETFD/FD_CLOEXEC are available everywhere except > Windows, and fixes some sloppy thinking about the EXEC_BACKEND case. > > I went ahead and changed the call to

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Robert Haas
On Wed, Apr 26, 2017 at 8:37 PM, Andres Freund wrote: >> 3. Go ahead with converting the postmaster to use WaitEventSet, a la >> the draft patch I posted earlier. I'd be happy to do this if we were >> at the start of a devel cycle, but right now seems a bit late --- not >> to

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Andres Freund
On 2017-04-26 11:42:38 -0400, Tom Lane wrote: > 1. Let HEAD stand as it is. We have a problem with slow response to > bgworker start requests that arrive while ServerLoop is active, but that's > a pretty tight window usually (although I believe I've seen it hit at > least once in testing). > >

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Tom Lane
Andres Freund writes: > I'd still like to get something like your CLOEXEC patch applied > independently however. Here's an updated version of that, which makes use of our previous conclusion that F_SETFD/FD_CLOEXEC are available everywhere except Windows, and fixes some

Re: [HACKERS] Unportable implementation of background worker start

2017-04-26 Thread Tom Lane
I wrote: > =?utf-8?Q?R=C3=A9mi_Zara?= writes: >> coypu was not stuck (no buildfarm related process running), but failed to >> clean-up shared memory and semaphores. >> I’ve done the clean-up. > Huh, that's even more interesting. I installed NetBSD 5.1.5 on an old Mac G4; I

Re: [HACKERS] Unportable implementation of background worker start

2017-04-25 Thread Tom Lane
=?utf-8?Q?R=C3=A9mi_Zara?= writes: >> Le 25 avr. 2017 à 01:47, Tom Lane a écrit : >> It looks like coypu is going to need manual intervention (ie, kill -9 >> on the leftover postmaster) to get unwedged :-(. That's particularly >> disturbing because it

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Rémi Zara
> Le 25 avr. 2017 à 01:47, Tom Lane a écrit : > > I wrote: >> What I'm inclined to do is to revert the pselect change but not the other, >> to see if that fixes these two animals. If it does, we could look into >> blacklisting these particular platforms when choosing

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane
I wrote: > What I'm inclined to do is to revert the pselect change but not the other, > to see if that fixes these two animals. If it does, we could look into > blacklisting these particular platforms when choosing pselect. It looks like coypu is going to need manual intervention (ie, kill -9 on

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane
Andres Freund writes: > On 2017-04-24 18:14:41 -0400, Tom Lane wrote: >> A bit of googling establishes that NetBSD 5.1 has a broken pselect >> implementation: >> >> http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=43625 > Yikes. Do I understand correctly that they

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund
On 2017-04-24 18:14:41 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-04-24 17:33:39 -0400, Tom Lane wrote: > >> coypu's problem is unrelated: > > > Note I was linking the 9.6 report form coypu, not HEAD. Afaics the 9.6 > > failure is the same as gharial's mode of

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane
Andres Freund writes: > On 2017-04-24 17:33:39 -0400, Tom Lane wrote: >> coypu's problem is unrelated: > Note I was linking the 9.6 report form coypu, not HEAD. Afaics the 9.6 > failure is the same as gharial's mode of failure. [ looks closer... ] Oh: the 9.6 run occurred

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund
On 2017-04-24 17:33:39 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-04-24 13:16:44 -0700, Andres Freund wrote: > >> Unclear if related, but > >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial=2017-04-24%2019%3A30%3A42 > >> has a suspicious timing

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Tom Lane
Andres Freund writes: > On 2017-04-24 13:16:44 -0700, Andres Freund wrote: >> Unclear if related, but >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial=2017-04-24%2019%3A30%3A42 >> has a suspicious timing of failing in a weird way. > Given that gharial is

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund
On 2017-04-24 13:16:44 -0700, Andres Freund wrote: > Unclear if related, but > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial=2017-04-24%2019%3A30%3A42 > has a suspicious timing of failing in a weird way. Given that gharial is also failing on 9.6 (same set of commits) and coypu

Re: [HACKERS] Unportable implementation of background worker start

2017-04-24 Thread Andres Freund
On 2017-04-21 23:50:41 -0400, Tom Lane wrote: > I wrote: > > Attached is a lightly-tested draft patch that converts the postmaster to > > use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions > > about whether this is the direction to proceed, though. > > Attached are a couple of

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
On 2017-04-21 23:50:41 -0400, Tom Lane wrote: > Attached are a couple of patches that represent a plausible Plan B. > The first one changes the postmaster to run its signal handlers without > specifying SA_RESTART. I've confirmed that that seems to fix the > select_parallel-test-takes-a-long-time

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
I wrote: > Attached is a lightly-tested draft patch that converts the postmaster to > use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions > about whether this is the direction to proceed, though. Attached are a couple of patches that represent a plausible Plan B. The first one

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> It also appears to me that do_start_bgworker's treatment of fork >> failure is completely brain dead. Did anyone really think about >> that case? > Hmm, I probably modelled it on autovacuum without giving that case much >

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Tom Lane writes: > Andres Freund writes: >> On 2017-04-21 14:08:21 -0400, Tom Lane wrote: >>> but I see that SUSv2 >>> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own >>> coding rules it ought to be okay to assume they're there.

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Andres Freund writes: > On 2017-04-21 14:08:21 -0400, Tom Lane wrote: >> but I see that SUSv2 >> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own >> coding rules it ought to be okay to assume they're there. I'm tempted to >> rip out the quoted bit, as

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
Tom, On 2017-04-21 13:49:27 -0400, Tom Lane wrote: > >> * If we are in PM_WAIT_DEAD_END state, then we don't want to accept > >> - * any new connections, so we don't call select(), and just > >> sleep. > >> + * any new connections, so we don't call WaitEventSetWait(), > >>

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
On 2017-04-21 14:08:21 -0400, Tom Lane wrote: > but I see that SUSv2 > mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own > coding rules it ought to be okay to assume they're there. I'm tempted to > rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see >

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
I wrote: > Andres Freund writes: >> On 2017-04-21 12:50:04 -0400, Tom Lane wrote: >>> +#ifndef FD_CLOEXEC >>> +#define FD_CLOEXEC 1 >>> +#endif >> Hm? Are we sure this is portable? Is there really cases that have >> F_SETFD, but not CLOEXEC? > Copied-and-pasted from our

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Andres Freund writes: > On 2017-04-21 12:50:04 -0400, Tom Lane wrote: >> Attached is a lightly-tested draft patch that converts the postmaster to >> use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions >> about whether this is the direction to proceed,

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> I'm coming back to the idea that at least in the back branches, the >> thing to do is allow maybe_start_bgworker to start multiple workers. >> >> Is there any actual evidence for the claim that that might have >> bad side

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
Hi, On 2017-04-21 12:50:04 -0400, Tom Lane wrote: > Attached is a lightly-tested draft patch that converts the postmaster to > use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions > about whether this is the direction to proceed, though. It adds at least > a couple of kernel

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Attached is a lightly-tested draft patch that converts the postmaster to use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions about whether this is the direction to proceed, though. It adds at least a couple of kernel calls per postmaster signal delivery, and probably to every

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Alvaro Herrera
Tom Lane wrote: > After sleeping and thinking more, I've realized that the > slow-bgworker-start issue actually exists on *every* platform, it's just > harder to hit when select() is interruptable. But consider the case > where multiple bgworker-start requests arrive while ServerLoop is >

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Andres Freund writes: > On 2017-04-20 00:50:13 -0400, Tom Lane wrote: >> My first reaction was that that sounded like a lot more work than removing >> two lines from maybe_start_bgworker and adjusting some comments. But on >> closer inspection, the slow-bgworker-start issue

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund
On 2017-04-20 20:10:41 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-04-20 20:05:02 -0400, Tom Lane wrote: > >> Also, if it's not there we'd fall back to using plain poll(), which is > >> not so awful that we need to work hard to avoid it. I'd just as soon > >>

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane
Andres Freund writes: > On 2017-04-20 20:05:02 -0400, Tom Lane wrote: >> Also, if it's not there we'd fall back to using plain poll(), which is >> not so awful that we need to work hard to avoid it. I'd just as soon >> keep the number of combinations down. > Just using

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund
On 2017-04-20 20:05:02 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-04-20 19:53:02 -0400, Tom Lane wrote: > >> So ... what would you say to replacing epoll_create() with > >> epoll_create1(EPOLL_CLOEXEC) ? Then a WaitEventSet would not > >> represent

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane
Andres Freund writes: > On 2017-04-20 19:53:02 -0400, Tom Lane wrote: >> So ... what would you say to replacing epoll_create() with >> epoll_create1(EPOLL_CLOEXEC) ? Then a WaitEventSet would not >> represent inheritable-across-exec resources on any platform, >> making it a

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund
On 2017-04-20 19:53:02 -0400, Tom Lane wrote: > I wrote: > > Andres Freund writes: > >> On 2017-04-20 19:23:28 -0400, Tom Lane wrote: > >>> or are the HANDLEs in a Windows WaitEventSet not inheritable > >>> resources? > > >> So that kind of sounds like it should be doable. >

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane
I wrote: > Andres Freund writes: >> On 2017-04-20 19:23:28 -0400, Tom Lane wrote: >>> or are the HANDLEs in a Windows WaitEventSet not inheritable >>> resources? >> So that kind of sounds like it should be doable. > Ah, good. I'll add a comment about that and press on. So

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund
On 2017-04-20 00:50:13 -0400, Tom Lane wrote: > I wrote: > > Alvaro Herrera writes: > >> Tom Lane wrote: > >>> Hm. Do you have a more-portable alternative? > > >> I was thinking in a WaitEventSet from latch.c. > > My first reaction was that that sounded like a lot

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane
Andres Freund writes: > On 2017-04-20 19:23:28 -0400, Tom Lane wrote: >> or are the HANDLEs in a Windows WaitEventSet not inheritable >> resources? > I think we have control over that. According to >

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Andres Freund
On 2017-04-20 19:23:28 -0400, Tom Lane wrote: > or are the HANDLEs in a Windows WaitEventSet not inheritable > resources? I think we have control over that. According to https://msdn.microsoft.com/en-us/library/windows/desktop/ms724466(v=vs.85).aspx CreateProcess() has to be called with

Re: [HACKERS] Unportable implementation of background worker start

2017-04-20 Thread Tom Lane
I wrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> Hm. Do you have a more-portable alternative? >> I was thinking in a WaitEventSet from latch.c. > My first reaction was that that sounded like a lot more work than removing > two lines from maybe_start_bgworker

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
I wrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> Hm. Do you have a more-portable alternative? >> I was thinking in a WaitEventSet from latch.c. My first reaction was that that sounded like a lot more work than removing two lines from maybe_start_bgworker and

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
Andres Freund writes: > FWIW, I'd wished before that we used something a bit more modern than > select() if available... It's nice to be able to listen to a larger > number of sockets without repeated O(sockets) overhead. [ raised eyebrow... ] Is anyone really running

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Andres Freund
On 2017-04-19 18:56:26 -0400, Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Hm. Do you have a more-portable alternative? > > > I was thinking in a WaitEventSet from latch.c. > > Yeah, some googling turns up the suggestion that a self-pipe is a

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> Hm. Do you have a more-portable alternative? > I was thinking in a WaitEventSet from latch.c. Yeah, some googling turns up the suggestion that a self-pipe is a portable way to get consistent semantics from select(); latch.c

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> So I'm wondering what the design rationale was for only starting one > >> bgworker per invocation. > > > The rationale was that there may be other tasks waiting for postmaster > > attention, and if there

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> So I'm wondering what the design rationale was for only starting one >> bgworker per invocation. > The rationale was that there may be other tasks waiting for postmaster > attention, and if there are many bgworkers needing to

Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Alvaro Herrera
Tom Lane wrote: > While I haven't yet tested it, it seems like a fix might be as simple > as deleting these lines in maybe_start_bgworker: > > /* > * Have ServerLoop call us again. Note that there might not > * actually *be* another runnable worker, but we

[HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
I chanced to notice that on gaur/pademelon, the "select_parallel" regression test sometimes takes a great deal longer than normal, for no obvious reason. It does eventually terminate, but sometimes only after 10 to 20 minutes rather than the couple dozen seconds that are typical on that slow