Re: [HACKERS] More race conditions in logical replication

2017-08-12 Thread Michael Paquier
On Sun, Aug 13, 2017 at 10:25 AM, Noah Misch wrote: > Committed. These counts broke three times in the v10 release cycle. It's too > bad this defect doesn't cause an error when building the docs. That's another argument for generating the table dynamically. Thanks for the

Re: [HACKERS] More race conditions in logical replication

2017-08-12 Thread Noah Misch
On Sat, Aug 12, 2017 at 05:38:29PM +0900, Michael Paquier wrote: > On Thu, Aug 10, 2017 at 4:22 PM, Michael Paquier > wrote: > > On Tue, Aug 8, 2017 at 8:11 PM, Alvaro Herrera > > wrote: > >> Here's a patch. It turned to be a bit larger than

Re: [HACKERS] More race conditions in logical replication

2017-08-12 Thread Michael Paquier
On Thu, Aug 10, 2017 at 4:22 PM, Michael Paquier wrote: > On Tue, Aug 8, 2017 at 8:11 PM, Alvaro Herrera > wrote: >> Here's a patch. It turned to be a bit larger than I initially expected. > > Álvaro, 030273b7 did not get things completely

Re: [HACKERS] More race conditions in logical replication

2017-08-10 Thread Michael Paquier
On Tue, Aug 8, 2017 at 8:11 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera >> wrote: >> > BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait >> > event here (and in the

Re: [HACKERS] More race conditions in logical replication

2017-08-08 Thread Alvaro Herrera
Robert Haas wrote: > On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera > wrote: > > BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait > > event here (and in the replication slot case) is bogus. We probably > > need something new here. > > Yeah, if

Re: [HACKERS] More race conditions in logical replication

2017-08-07 Thread Robert Haas
On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera wrote: > BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait > event here (and in the replication slot case) is bogus. We probably > need something new here. Yeah, if you're adding a new wait point, you

Re: [HACKERS] More race conditions in logical replication

2017-08-07 Thread Alvaro Herrera
Alvaro Herrera wrote: > Alvaro Herrera wrote: > > I just noticed that Jacana failed again in the subscription test, and it > > looks like it's related to this. I'll take a look tomorrow if no one > > beats me to it. > >

Re: [HACKERS] More race conditions in logical replication

2017-07-27 Thread Alvaro Herrera
Alvaro Herrera wrote: > Alvaro Herrera wrote: > > > Hmm, yeah, that's not good. However, I didn't like the idea of putting > > it inside the locked area, as it's too much code. Instead I added just > > before acquiring the spinlock. We cancel the sleep unconditionally > > later on if we didn't

Re: [HACKERS] More race conditions in logical replication

2017-07-26 Thread Alvaro Herrera
Alvaro Herrera wrote: > Hmm, yeah, that's not good. However, I didn't like the idea of putting > it inside the locked area, as it's too much code. Instead I added just > before acquiring the spinlock. We cancel the sleep unconditionally > later on if we didn't need to sleep after all. I just

Re: [HACKERS] More race conditions in logical replication

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 5:47 AM, Petr Jelinek wrote: > As a side note, the ConditionVariablePrepareToSleep()'s comment could be > improved because currently it says the only advantage is that we skip > double-test in the beginning of ConditionVariableSleep(). But

Re: [HACKERS] More race conditions in logical replication

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 1:42 PM, Alvaro Herrera wrote: > As I see it, we need to backpatch at least parts of this patch. I've > received reports that in earlier branches pglogical and BDR can > sometimes leave slots behind when removing nodes, and I have a hunch > that

Re: [HACKERS] More race conditions in logical replication

2017-07-25 Thread Alvaro Herrera
Petr Jelinek wrote: > On 25/07/17 01:33, Alvaro Herrera wrote: > > Alvaro Herrera wrote: > >> I'm back at looking into this again, after a rather exhausting week. I > >> think I have a better grasp of what is going on in this code now, > > > > Here's an updated patch, which I intend to push

Re: [HACKERS] More race conditions in logical replication

2017-07-25 Thread Petr Jelinek
On 25/07/17 01:33, Alvaro Herrera wrote: > Alvaro Herrera wrote: >> I'm back at looking into this again, after a rather exhausting week. I >> think I have a better grasp of what is going on in this code now, > > Here's an updated patch, which I intend to push tomorrow. > I think the

Re: [HACKERS] More race conditions in logical replication

2017-07-24 Thread Alvaro Herrera
Alvaro Herrera wrote: > I'm back at looking into this again, after a rather exhausting week. I > think I have a better grasp of what is going on in this code now, Here's an updated patch, which I intend to push tomorrow. > and it > appears to me that we should change the locking so that

Re: [HACKERS] More race conditions in logical replication

2017-07-21 Thread Alvaro Herrera
I'm back at looking into this again, after a rather exhausting week. I think I have a better grasp of what is going on in this code now, and it appears to me that we should change the locking so that active_pid is protected by ReplicationSlotControlLock instead of the slot's spinlock; but I

Re: [HACKERS] More race conditions in logical replication

2017-07-18 Thread Alvaro Herrera
Alvaro Herrera wrote: > After going over this a bunch more times, I found other problems. For > example, I noticed that if I create a temporary slot in one session, > then another session would rightly go to sleep if it tries to drop that > slot. But the slot goes away when the first session

Re: [HACKERS] More race conditions in logical replication

2017-07-15 Thread Alvaro Herrera
After going over this a bunch more times, I found other problems. For example, I noticed that if I create a temporary slot in one session, then another session would rightly go to sleep if it tries to drop that slot. But the slot goes away when the first session disconnects, so I'd expect the

Re: [HACKERS] More race conditions in logical replication

2017-07-14 Thread Noah Misch
On Wed, Jul 12, 2017 at 06:48:28PM -0400, Alvaro Herrera wrote: > Petr Jelinek wrote: > > > So best idea I could come up with is to make use of the new condition > > variable API. That lets us wait for variable which can be set per slot. > > Here's a cleaned up version of that patch, which I

Re: [HACKERS] More race conditions in logical replication

2017-07-12 Thread Alvaro Herrera
Petr Jelinek wrote: > So best idea I could come up with is to make use of the new condition > variable API. That lets us wait for variable which can be set per slot. Here's a cleaned up version of that patch, which I intend to get in the tree tomorrow. I verified that this allows the

Re: [HACKERS] More race conditions in logical replication

2017-07-10 Thread Alvaro Herrera
Alvaro Herrera wrote: > I'll next update this on or before Monday 10th at 19:00 CLT. I couldn't get to this today as I wanted. Next update on Wednesday 12th, same time. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training &

Re: [HACKERS] More race conditions in logical replication

2017-07-07 Thread Tom Lane
Alvaro Herrera writes: > I'll next update this on or before Monday 10th at 19:00 CLT. Schedule note --- we'll be wrapping beta2 on Monday, a couple hours before that. Although it'd be great to have this issue fixed before beta2, jamming in a patch just a few hours

Re: [HACKERS] More race conditions in logical replication

2017-07-07 Thread Alvaro Herrera
Petr Jelinek wrote: > So best idea I could come up with is to make use of the new condition > variable API. That lets us wait for variable which can be set per slot. > > It's not backportable however, I am not sure how big problem that is > considering the lack of complaints until now (maybe we

Re: [HACKERS] More race conditions in logical replication

2017-07-07 Thread Petr Jelinek
On 06/07/17 18:20, Petr Jelinek wrote: > On 06/07/17 17:33, Petr Jelinek wrote: >> On 03/07/17 01:54, Tom Lane wrote: >>> I noticed a recent failure that looked suspiciously like a race condition: >>> >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2017-07-02%2018%3A02%3A07 >>>

Re: [HACKERS] More race conditions in logical replication

2017-07-06 Thread Petr Jelinek
On 06/07/17 17:33, Petr Jelinek wrote: > On 03/07/17 01:54, Tom Lane wrote: >> I noticed a recent failure that looked suspiciously like a race condition: >> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2017-07-02%2018%3A02%3A07 >> >> The critical bit in the log file is >> >>

Re: [HACKERS] More race conditions in logical replication

2017-07-06 Thread Petr Jelinek
On 03/07/17 01:54, Tom Lane wrote: > I noticed a recent failure that looked suspiciously like a race condition: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2017-07-02%2018%3A02%3A07 > > The critical bit in the log file is > > error running SQL: 'psql::1: ERROR: could not

Re: [HACKERS] More race conditions in logical replication

2017-07-05 Thread Alvaro Herrera
Noah Misch wrote: > The above-described topic is currently a PostgreSQL 10 open item. Peter, > since you committed the patch believed to have created it, you own this open > item. I volunteer to own this item. My next update is going to be on or before Friday 7th at 19:00 Chilean time, though

Re: [HACKERS] More race conditions in logical replication

2017-07-03 Thread Noah Misch
On Sun, Jul 02, 2017 at 07:54:48PM -0400, Tom Lane wrote: > I noticed a recent failure that looked suspiciously like a race condition: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2017-07-02%2018%3A02%3A07 > > The critical bit in the log file is > > error running SQL:

[HACKERS] More race conditions in logical replication

2017-07-02 Thread Tom Lane
I noticed a recent failure that looked suspiciously like a race condition: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2017-07-02%2018%3A02%3A07 The critical bit in the log file is error running SQL: 'psql::1: ERROR: could not drop the replication slot "tap_sub" on