On Tue, 14 May 2024 at 08:55, David Rowley wrote:
> I've not seen any recent failures from Parula that relate to this
> issue. The last one seems to have been about 4 weeks ago.
>
> I'm now wondering if it's time to revert the debugging code added in
> 1db689715. Does anyone think differently?
David Rowley writes:
> I've not seen any recent failures from Parula that relate to this
> issue. The last one seems to have been about 4 weeks ago.
> I'm now wondering if it's time to revert the debugging code added in
> 1db689715. Does anyone think differently?
+1. It seems like we wrote
On Thu, 21 Mar 2024 at 13:53, David Rowley wrote:
>
> On Thu, 21 Mar 2024 at 12:36, Tom Lane wrote:
> > So yeah, if we could have log_autovacuum_min_duration = 0 perhaps
> > that would yield a clue.
>
> FWIW, I agree with your earlier statement about it looking very much
> like auto-vacuum has
On Tue, 16 Apr 2024 at 18:58, Robins Tharakan wrote:
> The last 25 consecutive runs have passed [1] after switching
> REL_12_STABLE to -O0 ! So I am wondering whether that confirms that
> the compiler version is to blame, and while we're still here,
> is there anything else I could try?
I don't
On Mon, 15 Apr 2024 at 16:02, Tom Lane wrote:
> David Rowley writes:
> > If GetNowFloat() somehow was returning a negative number then we could
> > end up with a large delay. But if gettimeofday() was so badly broken
> > then wouldn't there be some evidence of this in the log timestamps on
> >
David Rowley writes:
> #4 0x0090b7b4 in pg_sleep (fcinfo=) at misc.c:406
> delay =
> delay_ms =
> endtime = 0
> This endtime looks like a problem. It seems unlikely to be caused by
> gettimeofday's timeval fields being zeroed given that the number of
> seconds
On Mon, 15 Apr 2024 at 14:55, David Rowley wrote:
> If GetNowFloat() somehow was returning a negative number then we could
> end up with a large delay. But if gettimeofday() was so badly broken
> then wouldn't there be some evidence of this in the log timestamps on
> failing runs?
3 things
On Mon, 15 Apr 2024 at 16:10, Robins Tharakan wrote:
> - I now have 2 separate runs stuck on pg_sleep() - HEAD / REL_16_STABLE
> - I'll keep them (stuck) for this week, in case there's more we can get
> from them (and to see how long they take)
> - Attached are 'bt full' outputs for both (b.txt -
On Sun, 14 Apr 2024 at 00:12, Tom Lane wrote:
> If we were only supposed to sleep 0.1 seconds, how is it waiting
> for 60 ms (and, presumably, repeating that)? The logic in
> pg_sleep is pretty simple, and it's hard to think of anything except
> the system clock jumping (far) backwards that
On 4/13/24 15:02, Robins Tharakan wrote:
> On Wed, 10 Apr 2024 at 10:24, David Rowley wrote:
>>
>> Master failed today for the first time since the compiler upgrade.
>> Again reltuples == 48.
>
> Here's what I can add over the past few days:
> - Almost all failures are either reltuples=48 or
On 4/9/24 05:48, David Rowley wrote:
> On Mon, 8 Apr 2024 at 23:56, Robins Tharakan wrote:
>> #3 0x0083ed84 in WaitLatch (latch=,
>> wakeEvents=wakeEvents@entry=41, timeout=60,
>> wait_event_info=wait_event_info@entry=150994946) at latch.c:538
>> #4 0x00907404 in
Robins Tharakan writes:
> HEAD is stuck again on pg_sleep(), no CPU for the past hour or so.
> Stack trace seems to be similar to last time.
> #3 0x008437c4 in WaitLatch (latch=,
> wakeEvents=wakeEvents@entry=41, timeout=60,
> wait_event_info=wait_event_info@entry=150994946) at
On Mon, 8 Apr 2024 at 21:25, Robins Tharakan wrote:
>
>
> I'll keep an eye on this instance more often for the next few days.
> (Let me know if I could capture more if a run gets stuck again)
HEAD is stuck again on pg_sleep(), no CPU for the past hour or so.
Stack trace seems to be similar to
On Wed, 10 Apr 2024 at 10:24, David Rowley wrote:
>
> Master failed today for the first time since the compiler upgrade.
> Again reltuples == 48.
Here's what I can add over the past few days:
- Almost all failures are either reltuples=48 or SIGABRTs
- Almost all SIGABRTs are DDLs - CREATE INDEX
On Wed, 10 Apr 2024 at 10:24, David Rowley wrote:
> Master failed today for the first time since the compiler upgrade.
> Again reltuples == 48.
>From the buildfarm members page, parula seems to be the only aarch64 + gcc
13.2
combination today, and then I suspect whether this is about gcc v13.2
On Tue, 9 Apr 2024 at 15:48, David Rowley wrote:
> Still no partition_prune failures on master since the compiler version
> change. There has been one [1] in REL_16_STABLE. I'm thinking it
> might be worth backpatching the partition_prune debug to REL_16_STABLE
> to see if we can learn anything
On Mon, 8 Apr 2024 at 23:56, Robins Tharakan wrote:
> #3 0x0083ed84 in WaitLatch (latch=,
> wakeEvents=wakeEvents@entry=41, timeout=60,
> wait_event_info=wait_event_info@entry=150994946) at latch.c:538
> #4 0x00907404 in pg_sleep (fcinfo=) at misc.c:406
> #17
On Tue, 2 Apr 2024 at 15:01, Tom Lane wrote:
> "Tharakan, Robins" writes:
> > So although HEAD ran fine, but I saw multiple failures (v12, v13, v16)
all of which passed on subsequent-tries,
> > of which some were even"signal 6: Aborted".
>
> Ugh...
parula didn't send any reports to buildfarm
"Tharakan, Robins" writes:
>> I've now switched to GCC v13.2 and triggered a run. Let's see if the tests
>> stabilize now.
> So although HEAD ran fine, but I saw multiple failures (v12, v13, v16) all of
> which passed on subsequent-tries,
> of which some were even"signal 6: Aborted".
Ugh...
> I've now switched to GCC v13.2 and triggered a run. Let's see if the tests
> stabilize now.
So although HEAD ran fine, but I saw multiple failures (v12, v13, v16) all of
which passed on subsequent-tries,
of which some were even"signal 6: Aborted".
FWIW, I compiled gcc v13.2 (default options)
> ... in connection with which, I can't help noticing that parula is using a
> very old compiler:
>
> configure: using compiler=gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-17)
>
> From some quick checking around, that would have to be near the beginning of
> aarch64
> support in RHEL (Fedora hadn't
David Rowley writes:
> On Sat, 30 Mar 2024 at 09:17, Tom Lane wrote:
>> ... in connection with which, I can't help noticing that parula
>> is using a very old compiler:
>> configure: using compiler=gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-17)
>> I wonder why parula is using that when its
On Sat, 30 Mar 2024 at 09:17, Tom Lane wrote:
>
> I wrote:
> > I'd not looked closely enough at the previous failure, because
> > now that I have, this is well out in WTFF territory: how can
> > reltuples be greater than zero when relpages is zero? This can't
> > be a state that autovacuum would
I wrote:
> I'd not looked closely enough at the previous failure, because
> now that I have, this is well out in WTFF territory: how can
> reltuples be greater than zero when relpages is zero? This can't
> be a state that autovacuum would have left behind, unless it's
> really seriously broken.
David Rowley writes:
> On Wed, 27 Mar 2024 at 18:28, Tom Lane wrote:
>> Let's wait a bit to see if it fails in HEAD ... but if not, would
>> it be reasonable to back-patch the additional debugging output?
> I think REL_16_STABLE has told us that it's not an auto-vacuum issue.
> I'm uncertain
On Wed, 27 Mar 2024 at 18:28, Tom Lane wrote:
>
> David Rowley writes:
> > Unfortunately, REL_16_STABLE does not have the additional debugging,
> > so don't get to know what reltuples was set to.
>
> Let's wait a bit to see if it fails in HEAD ... but if not, would
> it be reasonable to
David Rowley writes:
> Unfortunately, REL_16_STABLE does not have the additional debugging,
> so don't get to know what reltuples was set to.
Let's wait a bit to see if it fails in HEAD ... but if not, would
it be reasonable to back-patch the additional debugging output?
On Tue, 26 Mar 2024 at 21:03, Tharakan, Robins wrote:
>
> > David Rowley writes:
> > It would be good to have log_autovacuum_min_duration = 0 on this machine
> > for a while.
>
> - Have set log_autovacuum_min_duration=0 on parula and a test run came out
> okay.
> - Also added REL_16_STABLE to
Hi David / Tom,
> David Rowley writes:
> It would be good to have log_autovacuum_min_duration = 0 on this machine for
> a while.
- Have set log_autovacuum_min_duration=0 on parula and a test run came out okay.
- Also added REL_16_STABLE to the branches being tested (in case it matters
here).
On Thu, 21 Mar 2024 at 14:19, Tom Lane wrote:
>
> David Rowley writes:
> > We could also do something like the attached just in case we're
> > barking up the wrong tree.
>
> Yeah, checking indisvalid isn't a bad idea. I'd put another
> one further down, just before the DROP of table ab, so we
>
David Rowley writes:
> We could also do something like the attached just in case we're
> barking up the wrong tree.
Yeah, checking indisvalid isn't a bad idea. I'd put another
one further down, just before the DROP of table ab, so we
can see the state both before and after the unstable tests.
On Thu, 21 Mar 2024 at 12:36, Tom Lane wrote:
> So yeah, if we could have log_autovacuum_min_duration = 0 perhaps
> that would yield a clue.
FWIW, I agree with your earlier statement about it looking very much
like auto-vacuum has run on that table, but equally, if something like
the pg_index
David Rowley writes:
> Is it worth running that animal with log_autovacuum_min_duration = 0
> so we can see what's going on in terms of auto-vacuum auto-analyze in
> the log?
Maybe, but I'm not sure. I thought that if parula were somehow
hitting an ill-timed autovac/autoanalyze, it should be
On Wed, 20 Mar 2024 at 08:58, Tom Lane wrote:
> I suppose we could attach "autovacuum=off" settings to these tables,
> but it doesn't seem to me that that should be necessary. These test
> cases are several years old and haven't given trouble before.
> Moreover, if that's necessary then there
On Wed, 20 Mar 2024 at 11:50, Matthias van de Meent
wrote:
>
> On Tue, 19 Mar 2024 at 20:58, Tom Lane wrote:
> >
> > For the last few days, buildfarm member parula has been intermittently
> > failing the partition_prune regression test, due to unexpected plan
> > changes [1][2][3][4]. The
On Tue, 19 Mar 2024 at 20:58, Tom Lane wrote:
>
> For the last few days, buildfarm member parula has been intermittently
> failing the partition_prune regression test, due to unexpected plan
> changes [1][2][3][4]. The symptoms can be reproduced exactly by
> inserting a "vacuum" of one or
For the last few days, buildfarm member parula has been intermittently
failing the partition_prune regression test, due to unexpected plan
changes [1][2][3][4]. The symptoms can be reproduced exactly by
inserting a "vacuum" of one or another of the partitions of table
"ab", so we can presume that
37 matches
Mail list logo