On Wed, Sep 3, 2025 at 9:47 AM Andrey Borodin wrote:
>
> > On 3 Sep 2025, at 11:37, Alexander Korotkov wrote:
> >
> > Could you, please, recheck?
>
> That patch also adds CondVar sleep in critical section. That patch is how we
> understood that such sleep is dangerous.
>
> Actual patch to deteac
> On 3 Sep 2025, at 11:37, Alexander Korotkov wrote:
>
> Could you, please, recheck?
That patch also adds CondVar sleep in critical section. That patch is how we
understood that such sleep is dangerous.
Actual patch to deteact a problem is much simpler:
```
diff --git a/src/backend/storage/
On Tue, Aug 12, 2025 at 8:38 AM Kirill Reshke
wrote:
> On Wed, 6 Aug 2025 at 20:00, Andrey Borodin wrote:
> >
> > Hi hackers!
> >
> > I was reviewing the patch about removing xl_heap_visible and found the
VM\WAL machinery very interesting.
> > At Yandex we had several incidents with corrupted VM
On Mon, Aug 25, 2025 at 10:07:26AM -0500, Nathan Bossart wrote:
> Now that this is reverted, can the related open item be marked as resolved?
Since there has been no further discussion, I will go ahead and resolve the
open item.
--
nathan
[RMT hat]
On Thu, Aug 21, 2025 at 06:42:48PM -0400, Tom Lane wrote:
> Alexander Korotkov writes:
>> On Tue, Aug 19, 2025 at 10:50 PM Tom Lane wrote:
>>> Therefore, I vote for reverting bc22dc0e0. Hopefully only
>>> temporarily, but it's too late to figure out another way for v18,
>>> and I don'
On Fri, Aug 22, 2025 at 10:27 AM Alexander Korotkov
wrote:
> And let's retry it for v19.
+1
I'm hoping we can fix PM death handling soon, and then I assume this
can go straight back in without modification. CVs are an essential
low level synchronisation component that really should work in lots
Hi, Tom!
On Tue, Aug 19, 2025 at 10:50 PM Tom Lane wrote:
> I'm inclined to think that we do want to prohibit WaitEventSetWait
> inside a critical section --- it just seems like a bad idea all
> around, even without considering this specific failure mode.
> Therefore, I vote for reverting bc22dc0
Alexander Korotkov writes:
> On Tue, Aug 19, 2025 at 10:50 PM Tom Lane wrote:
>> Therefore, I vote for reverting bc22dc0e0. Hopefully only
>> temporarily, but it's too late to figure out another way for v18,
>> and I don't think that bc22dc0e0 is such an essential improvement
>> that we can't af
On Fri, Aug 22, 2025 at 01:27:17AM +0300, Alexander Korotkov wrote:
> I'm OK about this. Do you mind if I revert bc22dc0e0 myself?
> And let's retry it for v19.
Yes, agreed that it may be the best thing to do for v18 based on
the information we have gathered until now.
--
Michael
signature.asc
On Wed, Aug 20, 2025 at 3:47 PM Tom Lane wrote:
> Having said that, we should in any case have a better story on
> what WaitEventSetWait should do after detecting postmaster death.
> So I'm all for trying to avoid the proc_exit path if we can
> design a better answer.
Yeah. I've posted a concept
> On 20 Aug 2025, at 00:55, Tom Lane wrote:
>
> Andrey Borodin writes:
>> I believe there is a bug with PageIsAllVisible(page) &&
>> visibilitymap_clear(). But I cannot prove it with an injection point test.
>> Because injections points rely on CondVar, that per se creates corruption in
>>
On Wed, Aug 20, 2025 at 09:14:04AM -0400, Andres Freund wrote:
> On 2025-08-19 23:47:21 -0400, Tom Lane wrote:
>> Hm. It still makes me mighty uncomfortable, because the point of a
>> critical section is "crash the database if anything goes wrong during
>> this bit". Waiting for another process -
Hi,
On 2025-08-19 23:47:21 -0400, Tom Lane wrote:
> Thomas Munro writes:
> > On Wed, Aug 20, 2025 at 7:50 AM Tom Lane wrote:
> >> I'm inclined to think that we do want to prohibit WaitEventSetWait
> >> inside a critical section --- it just seems like a bad idea all
> >> around, even without cons
On Wed, 20 Aug 2025 at 00:50, Tom Lane wrote:
>
> Kirill Reshke writes:
> > I revert this commit (these were conflicts but i resolved them) and
> > added assert for crit sections in WaitEventSetWait.
>
> Your patch still contains some conflict markers :-(. Attached is
> a corrected version, jus
Thomas Munro writes:
> On Wed, Aug 20, 2025 at 7:50 AM Tom Lane wrote:
>> I'm inclined to think that we do want to prohibit WaitEventSetWait
>> inside a critical section --- it just seems like a bad idea all
>> around, even without considering this specific failure mode.
> FWIW aio/README.md des
On Wed, Aug 20, 2025 at 11:59 AM Thomas Munro wrote:
> they can't all be
> blocked in sig_wait() unless there is already a deadlock.
s/sig_wait()/sem_wait()/
On Wed, Aug 20, 2025 at 7:50 AM Tom Lane wrote:
> I'm inclined to think that we do want to prohibit WaitEventSetWait
> inside a critical section --- it just seems like a bad idea all
> around, even without considering this specific failure mode.
FWIW aio/README.md describes a case where we'd need
On Tue, Aug 19, 2025 at 03:55:41PM -0400, Tom Lane wrote:
> Yeah, I was coming to similar conclusions in the reply I just sent:
> we don't really want a policy that we can't put injection-point-based
> delays inside critical sections. So that infrastructure is leaving
> something to be desired.
Y
Andrey Borodin writes:
> I believe there is a bug with PageIsAllVisible(page) &&
> visibilitymap_clear(). But I cannot prove it with an injection point test.
> Because injections points rely on CondVar, that per se creates corruption in
> critical section. So I'm reading this discussion and won
Kirill Reshke writes:
> On Tue, 19 Aug 2025 at 10:32, Thomas Munro wrote:
>> Any idea involving deferring the handling of PM death from here
>> doesn't seem right: you'd keep waiting for the CV, but the backend
>> that would wake you might have exited.
Yeah. Taking the check for PM death out of
> On 19 Aug 2025, at 23:23, Kirill Reshke wrote:
>
>> We'd probably be best off to get back to the actual bug the
>> thread started with, namely whether we aren't doing the wrong
>> thing with VM-update order of operations.
>>
>>regards, tom lane
>
> My understanding
On Tue, 19 Aug 2025 at 23:08, Tom Lane wrote:
>
> Kirill Reshke writes:
> > On Tue, 19 Aug 2025 at 21:16, Yura Sokolov wrote:
> >> `if (CritSectionCount != 0) _exit(2) else proc_exit(1)` in
> >> WaitEventSetWaitBlock () solves the issue of inconsistency IF POSTMASTER IS
> >> SIGKILLED, and doesn
Kirill Reshke writes:
> On Tue, 19 Aug 2025 at 21:16, Yura Sokolov wrote:
>> `if (CritSectionCount != 0) _exit(2) else proc_exit(1)` in
>> WaitEventSetWaitBlock () solves the issue of inconsistency IF POSTMASTER IS
>> SIGKILLED, and doesn't lead to any problem, if postmaster is not SIGKILL-ed
>>
On Tue, 19 Aug 2025 at 20:24, Andres Freund wrote:
>
> Hi,
>
> On 2025-08-20 03:19:38 +1200, Thomas Munro wrote:
> > On Wed, Aug 20, 2025 at 2:57 AM Andres Freund wrote:
> > > On 2025-08-20 02:54:09 +1200, Thomas Munro wrote:
> > > > > On linux - the primary OS with OOM killer troubles - I'm pret
On Tue, 19 Aug 2025 at 21:16, Yura Sokolov wrote:
>
>
> `if (CritSectionCount != 0) _exit(2) else proc_exit(1)` in
> WaitEventSetWaitBlock () solves the issue of inconsistency IF POSTMASTER IS
> SIGKILLED, and doesn't lead to any problem, if postmaster is not SIGKILL-ed
> (since postmaster will S
On Tue, 19 Aug 2025 at 21:16, Yura Sokolov wrote:
>
> That is not true.
> elog(PANIC) doesn't clear LWLocks. And XLogWrite, which is could be called
> from AdvanceXLInsertBuffer, may call elog(PANIC) from several places.
>
> It doesn't lead to any error, because usually postmaster is alive and it
19.08.2025 16:43, Kirill Reshke пишет:
> On Tue, 19 Aug 2025 at 18:29, Yura Sokolov wrote:
>
>
>> Latch and ConditionVariable (that uses Latch) are among basic
>> synchronization primitives in PostgreSQL.
>
> Sure
>
>> Therefore they have to work correctly in any place: in critical section, in
Hi,
On 2025-08-20 03:19:38 +1200, Thomas Munro wrote:
> On Wed, Aug 20, 2025 at 2:57 AM Andres Freund wrote:
> > On 2025-08-20 02:54:09 +1200, Thomas Munro wrote:
> > > > On linux - the primary OS with OOM killer troubles - I'm pretty sure'll
> > > > lwlock
> > > > waiters would get killed due t
On Wed, Aug 20, 2025 at 2:57 AM Andres Freund wrote:
> On 2025-08-20 02:54:09 +1200, Thomas Munro wrote:
> > > On linux - the primary OS with OOM killer troubles - I'm pretty sure'll
> > > lwlock
> > > waiters would get killed due to the postmaster death signal we've
> > > configured
> > > (c.f.
Hi,
On 2025-08-20 02:54:09 +1200, Thomas Munro wrote:
> > On linux - the primary OS with OOM killer troubles - I'm pretty sure'll
> > lwlock
> > waiters would get killed due to the postmaster death signal we've configured
> > (c.f. PostmasterDeathSignalInit()).
>
> No, that has a handler that ju
On Wed, Aug 20, 2025 at 1:56 AM Andres Freund wrote:
> On 2025-08-19 02:13:43 -0400, Tom Lane wrote:
> > > Then wouldn't backends blocked in LWLockAcquire(x) hang forever, after
> > > someone who holds x calls _exit()?
> >
> > If someone who holds x is killed by (say) the OOM killer, how do
> > we
Hi,
On 2025-08-19 02:13:43 -0400, Tom Lane wrote:
> Thomas Munro writes:
> > On Tue, Aug 19, 2025 at 4:52 AM Tom Lane wrote:
> >> But I'm of the opinion that proc_exit
> >> is the wrong thing to use after seeing postmaster death, critical
> >> section or no. We should assume that system integri
On Tue, 19 Aug 2025 at 18:29, Yura Sokolov wrote:
> Latch and ConditionVariable (that uses Latch) are among basic
> synchronization primitives in PostgreSQL.
Sure
> Therefore they have to work correctly in any place: in critical section, in
> wal logging, etc.
No. Before bc22dc0e0ddc2dcb6043a
19.08.2025 16:17, Kirill Reshke пишет:
> On Tue, 19 Aug 2025 at 14:14, Kirill Reshke wrote:
>>
>> This thread is a candidate for [0]
>>
>>
>> [0]https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
>>
>
> Let me summarize this thread for ease of understanding of what's going on:
>
> Timelin
19.08.2025 16:09, Andres Freund пишет:
> Hi,
>
> On 2025-08-19 15:56:05 +0300, Yura Sokolov wrote:
>> 09.08.2025 22:54, Kirill Reshke пишет:
>>> On Thu, 7 Aug 2025 at 21:36, Aleksander Alekseev
>>> wrote:
>>>
Perhaps there was a good
reason to update the VM *before* creating WAL records
On Tue, 19 Aug 2025 at 14:14, Kirill Reshke wrote:
>
> This thread is a candidate for [0]
>
>
> [0]https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
>
Let me summarize this thread for ease of understanding of what's going on:
Timeline:
1) Andrey Borodin sends a patch (on 6 Aug) claiming
Hi,
On 2025-08-19 15:56:05 +0300, Yura Sokolov wrote:
> 09.08.2025 22:54, Kirill Reshke пишет:
> > On Thu, 7 Aug 2025 at 21:36, Aleksander Alekseev
> > wrote:
> >
> >> Perhaps there was a good
> >> reason to update the VM *before* creating WAL records I'm unaware of.
> >
> > Looks like 503c730 in
09.08.2025 22:54, Kirill Reshke пишет:
> On Thu, 7 Aug 2025 at 21:36, Aleksander Alekseev
> wrote:
>
>> Perhaps there was a good
>> reason to update the VM *before* creating WAL records I'm unaware of.
>
> Looks like 503c730 intentionally does it this way; however, I have not
> yet fully underst
10.08.2025 08:45, Kirill Reshke пишет:
> On Sun, 10 Aug 2025 at 01:55, Aleksander Alekseev
> wrote:
>> For this reason we have PageHeaderData.pd_lsn for instance - to make sure
>> pages are evicted only *after* the record that changed it is written
>> to disk (because WAL records can't be applied
On 8/19/25 11:14, Kirill Reshke wrote:
> This thread is a candidate for [0]
>
>
> [0]https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items>
>
Added, with Alexander as an owner (assuming it really is caused by
commit bc22dc0e0d.
regards
-
This thread is a candidate for [0]
[0]https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
Best regards,
Kirill Reshke
On Tue, 19 Aug 2025 at 10:32, Thomas Munro wrote:
>
> I don't know if there are other ways that LWLockReleaseAll() can lead
> to persistent corruption that won't be corrected by crash recovery,
> but this one is probably new since the following commit, explaining
> the failure to reproduce on v17
On Tue, 19 Aug 2025 at 11:13, Tom Lane wrote:
>
> Thomas Munro writes:
> > On Tue, Aug 19, 2025 at 4:52 AM Tom Lane wrote:
> >> But I'm of the opinion that proc_exit
> >> is the wrong thing to use after seeing postmaster death, critical
> >> section or no. We should assume that system integrity
Thomas Munro writes:
> On Tue, Aug 19, 2025 at 4:52 AM Tom Lane wrote:
>> But I'm of the opinion that proc_exit
>> is the wrong thing to use after seeing postmaster death, critical
>> section or no. We should assume that system integrity is already
>> compromised, and get out as fast as we can w
Hi! Thank you for putting attention to this.
On Tue, 19 Aug 2025 at 10:32, Thomas Munro wrote:
>
> On Tue, Aug 19, 2025 at 4:52 AM Tom Lane wrote:
> > But I'm of the opinion that proc_exit
> > is the wrong thing to use after seeing postmaster death, critical
> > section or no. We should assume
On Tue, Aug 19, 2025 at 4:52 AM Tom Lane wrote:
> But I'm of the opinion that proc_exit
> is the wrong thing to use after seeing postmaster death, critical
> section or no. We should assume that system integrity is already
> compromised, and get out as fast as we can with as few side-effects
> as
Kirill Reshke writes:
> On Sun, 17 Aug 2025 at 19:33, Tom Lane wrote:
>> I do not like this patch one bit: it will replace one set of problems
>> with another set, namely systems that fail to shut down.
> I did not observe this during my by-hand testing. I am under the
> impression that CRIT sec
On Mon, 18 Aug 2025 at 13:15, I wrote:
> > I do not like this patch one bit: it will replace one set of problems
> > with another set, namely systems that fail to shut down.
>
> I did not observe this during my by-hand testing.
I am sorry: I was wrong. This is exactly what happens in this test
(mo
Hi! Thank you for putting attention to this.
On Sun, 17 Aug 2025 at 19:33, Tom Lane wrote:
>
> Kirill Reshke writes:
> > [ v1-0001-Do-not-exit-on-postmaster-death-ever-inside-CRIT-.patch ]
>
> I do not like this patch one bit: it will replace one set of problems
> with another set, namely system
> On 17 Aug 2025, at 17:33, Tom Lane wrote:
>
> So I think the correct fix here is s/proc_exit(1)/_exit(2)/ in the
> places that are responding to postmaster death.
+1. But should we _exit(2) only in critical section or always in case of
postmaster death?
Another question that was botherin
Kirill Reshke writes:
> [ v1-0001-Do-not-exit-on-postmaster-death-ever-inside-CRIT-.patch ]
I do not like this patch one bit: it will replace one set of problems
with another set, namely systems that fail to shut down.
I think the actual bug here is the use of proc_exit(1) after
observing postma
On Thu, 14 Aug 2025 at 10:41, Kirill Reshke wrote:
>
o I am trying to reproduce is following:
>
> 1) Some process p1 locks some buffer (name it buf1), enters CRIT
> section, calls MarkBufferDirty and hangs inside XLogInsert on CondVar
> in (GetXLogBuffer -> AdvanceXLInsertBuffer).
> 2) CHECKPOINT
On Wed, 13 Aug 2025 at 16:15, I wrote:
> I did not find any doc or other piece of information indicating
> whether WaitEventSetWait and critical sections are allowed. But I do
> thing this is bad, because we do not process interruptions during
> critical sections, so it is unclear to me why we shou
On Tue, 12 Aug 2025 at 13:00, I wrote:
> While this aims to find existing VM corruption (i mean, in PG <= 17),
> this reproducer does not seem to work on pg17. At least, I did not
> manage to reproduce this scenario on pg17.
>
> This makes me think this exact corruption may be pg18-only. Is it
> p
On Wed, 6 Aug 2025 at 20:00, Andrey Borodin wrote:
>
> Hi hackers!
>
> I was reviewing the patch about removing xl_heap_visible and found the VM\WAL
> machinery very interesting.
> At Yandex we had several incidents with corrupted VM and on pgconf.dev
> colleagues from AWS confirmed that they sa
On Tue, 12 Aug 2025 at 10:38, I wrote:
> CHECKPOINT
> somehow manages to flush the heap page when instance kill-9-ed.
This corruption does not reproduce without CHECKPOINT call, however I
do not see any suspicious syscal that CHECKPOINT's process does.
It does not write anything to disk here, isn
On Wed, 6 Aug 2025 at 20:00, Andrey Borodin wrote:
>
> Hi hackers!
>
> I was reviewing the patch about removing xl_heap_visible and found the VM\WAL
> machinery very interesting.
> At Yandex we had several incidents with corrupted VM and on pgconf.dev
> colleagues from AWS confirmed that they sa
> On 9 Aug 2025, at 23:54, Aleksander Alekseev wrote:
>
> IMHO: logging the changes first, then allowing to evict the page.
VM and BufferManager code does not allow flush of a buffer until changes are
logged.
The problem is that our crash-exiting system destroys locks that protect buffer
fr
On Sun, 10 Aug 2025 at 01:55, Aleksander Alekseev
wrote:
> For this reason we have PageHeaderData.pd_lsn for instance - to make sure
> pages are evicted only *after* the record that changed it is written
> to disk (because WAL records can't be applied to pages from the
> future).
We don't bump th
Hi Andrey,
> 0. checkpointer is going to flush a heap buffer but waits on content lock
> 1. client is resetting PD_ALL_VISIBLE from page
> 2. postmaster is killed and command client to go down
> 3. client calls LWLockReleaseAll() at ProcKill() (?)
> 4. checkpointer flushes buffer with reset PG_ALL
On Thu, 7 Aug 2025 at 21:36, Aleksander Alekseev
wrote:
> Perhaps there was a good
> reason to update the VM *before* creating WAL records I'm unaware of.
Looks like 503c730 intentionally does it this way; however, I have not
yet fully understood the reasoning behind it.
--
Best regards,
Kir
> On 9 Aug 2025, at 18:28, Andrey Borodin wrote:
>
> Also I investigated that in a moment of kill -9 checkpointer flushes heap
> page to disk despite content lock. I haven't found who released content lock
> though.
I've written this message and understood: its LWLockReleaseAll().
0. check
> On 7 Aug 2025, at 19:36, Aleksander Alekseev wrote:
>
> Maybe I misunderstood the intent of the test.
You understood precisely my intent of writing the test. But it fail not due to
a bug I anticipated!
So far I noticed that if I move injection point before
PageClearAllVisible(BufferGetPa
Hi Andrey,
> the test passes because you moved injection point to a very safe position
> [...]
> I want to emphasize that it seems to me that position of injection point is
> not a hint, but rather coincidental.
Well, I wouldn't say that the test passes merely because the location
of the injecti
> On 7 Aug 2025, at 17:09, Aleksander Alekseev wrote:
>
> If my understanding is correct, we should make a WAL record with the
> XLH_LOCK_ALL_FROZEN_CLEARED flag *before* we modify the VM but within
> the same critical section (in order to avoid race conditions within
> the same backend).
Wel
> On 7 Aug 2025, at 18:54, Andrey Borodin wrote:
>
> moved injection point to a very safe position.
BTW, your fix also fixes ALL_FROZEN stuff, just because WAL for heap insert is
already emitted by the time of -9.
I want to emphasize that it seems to me that position of injection point is n
Hi,
> If my understanding is correct, we should make a WAL record with the
> XLH_LOCK_ALL_FROZEN_CLEARED flag *before* we modify the VM but within
> the same critical section [...]
>
> A draft patch is attached. It makes the test pass and doesn't seem to
> break any other tests.
>
> Thoughts?
In
> > This is a tricky bug. Do you also have a proposal of a particular fix?
>
> If my understanding is correct, we should make a WAL record with the
> XLH_LOCK_ALL_FROZEN_CLEARED flag *before* we modify the VM but within
> the same critical section (in order to avoid race conditions within
> the sam
Hi again,
> I meant instance, not backend. Sorry for confusion.
It looks like I completely misunderstood what START_CRIT_SECTION() /
END_CRIT_SECTION() are for here. Simply ignore this part :) Apologies
for the noise.
Hi,
> This is a tricky bug. Do you also have a proposal of a particular fix?
If my understanding is correct, we should make a WAL record with the
XLH_LOCK_ALL_FROZEN_CLEARED flag *before* we modify the VM but within
the same critical section (in order to avoid race conditions within
the same back
Hi Andrey,
> I was reviewing the patch about removing xl_heap_visible and found the VM\WAL
> machinery very interesting.
> At Yandex we had several incidents with corrupted VM and on pgconf.dev
> colleagues from AWS confirmed that they saw something similar too.
> So I toyed around and accidenta
71 matches
Mail list logo