On Tue, 9 Jul 2019 22:22:47 +0200 Tiemen Ruiten <t.rui...@tech-lab.io> wrote:
> On Tue, Jul 9, 2019 at 4:21 PM Jehan-Guillaume de Rorthais <j...@dalibo.com> > wrote: > > > On Tue, 9 Jul 2019 13:22:06 +0200 > > Tiemen Ruiten <t.rui...@tech-lab.io> wrote: > > > > > On Mon, Jul 8, 2019 at 10:01 PM Jehan-Guillaume de Rorthais < > > j...@dalibo.com> > > ... > > > > I dig in xlog.c today. Maybe I can write a small extension to get the > > > > timeline > > > > from shared memory directly and make pgsqlms use it if it detects it. > > So > > > > people > > > > can decide if they feel like it is too invasive or really needed for > > > > their usecase. Maybe in next release. What do you think? Would it be > > > > useful to > > > > you? > > > > > > > > > > Yes, that would be a really useful addition IMO. I would definitely use > > it. > > > If we can avoid taking a checkpoint that will save precious minutes > > during > > > a failover and the risk of timeouts would be drastically reduced. Would > > be > > > happy to test it if you want! > > > > OK, thanks. Not sure when I'll have time to work on this. But I'll stay in > > touch with you then. > > > > Great! > > > > > > I have to work on the v12 support as well :/ > > > > > > > I managed to improve the average time checkpoints are taking already > > from > > > > > what I mentioned in that thread, mainly by decreasing > > checkpoint_timeout > > > > > and setting full_page_writes = off; ostensibly not necessary on > > ZFS. > > > > > > > > The "full_page_writes" helps lowering the amount of WAL produced. Not > > the > > > > amount of writes to sync during the checkpoint. But I am sure it helps > > for > > > > your performances :) > > > > > > If I'm saturating the IO capacity of my system during a forced checkpoint > > > and full_page_writes = off reduces IO by reducing the amount of WAL, then > > > it should help in an indirect way? > > > > The master is supposed to be gone during a failover, neither in reads or > > writes. > > > OK, I didn't consider this. > > > > The checkpoint occurs on each standby to force sync their > > controldata. The checkpoint itself does not writes to WALs or read them. > > Am I > > forgetting something obvious? > > > > Maybe you can have some writes if the standby need to sync last received > > WALs and some reads if the standby was lagging on replay...But it > > shouldn't be > > much... > > > > I double-checked monitoring data: there was approximately one minute of > replication lag on one slave and two minutes of replication lag on the > other slave when the original issue occurred. what lag? current primary LSN versus sent, received, synced or replayed? > By the way, I'm still seeing worrying amounts of replication lag on both > slaves at times (usually not on both at the same time) so that's really > puzzling: all hardware and configuration is identical. Same question. What metric do you look at exactly? > Anyway, that's something for another thread/mailinglist I suppose :) Indeed. this should be discussed on pgsql-general rather than on clusterlabs :) _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/