Incorrect snapshots while promoting hot standby node when 2PC is used

2021-04-22 Thread Andres Freund
Hi, Michael Paquier (running locally I think), and subsequently Thomas Munro (noticing [1]), privately reported that they noticed an assertion failure in GetSnapshotData(). Both reasonably were wondering if that's related to the snapshot scalability patches. Michael reported the following asserti

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-09-30 Thread Michael Paquier
On Mon, May 31, 2021 at 09:37:17PM +0900, Michael Paquier wrote: > I have been looking at all that for the last couple of days, and > checked the code to make sure that relying on RecoveryInProgress() as > the tipping point is logically correct in terms of virtual XID, > snapshot build and KnownAss

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-10-04 Thread Michael Paquier
On Fri, Oct 01, 2021 at 02:11:15PM +0900, Michael Paquier wrote: > A couple of months later, I have looked back at this thread and this > report. I have rechecked all the standby handling and snapshot builds > involving KnownAssignedXids and looked at all the phases that are > getting called until

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-10-05 Thread Andres Freund
On 2021-10-04 17:27:44 +0900, Michael Paquier wrote: > On Fri, Oct 01, 2021 at 02:11:15PM +0900, Michael Paquier wrote: > > A couple of months later, I have looked back at this thread and this > > report. I have rechecked all the standby handling and snapshot builds > > involving KnownAssignedXids

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-05-01 Thread Andrey Borodin
Hi Andres! > 23 апр. 2021 г., в 01:36, Andres Freund написал(а): > > So snapshots within that window will always be "empty", i.e. xmin is > latestCompletedXid and xmax is latestCompletedXid + 1. Once we reach 3), we'll > look at the procarray, which then leads xmin going back to 588. > > > I t

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-05-03 Thread Andres Freund
Hi, On 2021-05-01 17:35:09 +0500, Andrey Borodin wrote: > I'm investigating somewhat resemblant case. > We have an OLTP sharded installation where shards are almost always under > rebalancing. Data movement is implemented with 2pc. > Switchover happens quite often due to datacenter drills. The in

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-05-03 Thread Andrey Borodin
> 3 мая 2021 г., в 23:10, Andres Freund написал(а): > > Hi, > > On 2021-05-01 17:35:09 +0500, Andrey Borodin wrote: >> I'm investigating somewhat resemblant case. >> We have an OLTP sharded installation where shards are almost always under >> rebalancing. Data movement is implemented with 2p

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-05-04 Thread Tom Lane
Andres Freund writes: > Michael Paquier (running locally I think), and subsequently Thomas Munro > (noticing [1]), privately reported that they noticed an assertion failure in > GetSnapshotData(). Both reasonably were wondering if that's related to the > snapshot scalability patches. > Michael rep

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-05-04 Thread Andres Freund
Hi, On 2021-05-04 12:32:34 -0400, Tom Lane wrote: > Andres Freund writes: > > Michael Paquier (running locally I think), and subsequently Thomas Munro > > (noticing [1]), privately reported that they noticed an assertion failure in > > GetSnapshotData(). Both reasonably were wondering if that's r

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-05-26 Thread Michael Paquier
On Thu, Apr 22, 2021 at 01:36:03PM -0700, Andres Freund wrote: > The sequence in StartupXLOG() leading to the issue is the following: > > 1) RecoverPreparedTransactions(); > 2) ShutdownRecoveryTransactionEnvironment(); > 3) XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE; > > Because 2) resets

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-05-27 Thread Andres Freund
Hi, On 2021-05-26 16:57:31 +0900, Michael Paquier wrote: > Yes, there should not be any as far as I recall. 2PC is kind of > special with its fake ProcArray entries. It's really quite an awful design :( > > I think to fix the issue we'd have to move > > ShutdownRecoveryTransactionEnvironment()

Re: Incorrect snapshots while promoting hot standby node when 2PC is used

2021-05-31 Thread Michael Paquier
On Thu, May 27, 2021 at 10:01:49AM -0700, Andres Freund wrote: > Why would it be intrusive? We're talking a split second here, no? More > importantly, I don't think it's correct to release the locks at that > point. I have been looking at all that for the last couple of days, and checked the code