Hi,
On 2023-06-19 10:04:35 +, Evgeny Morozov wrote:
> There haven't been any updates posted to
> https://www.postgresql.org/message-id/20230509040203.z6mvijumv7wxcuib%40awork3.anarazel.de
> so I just wanted to check if there is any update on the status of the
> patch? Can we expect it in Postg
There haven't been any updates posted to
https://www.postgresql.org/message-id/20230509040203.z6mvijumv7wxcuib%40awork3.anarazel.de
so I just wanted to check if there is any update on the status of the
patch? Can we expect it in PostgreSQL 15.4? Thanks.
On 17/05/2023 1:39 am, Andres Freund wrote:
> Try to prevent the DROP DATABASE from getting cancelled :/.
I still don't know why that's happening. I mean, I know why it gets
cancelled (the client timeout we set in Npgsql), but I don't know why
the drop does not succeed within 30 seconds. We could,
Hi,
On 2023-05-16 14:20:46 +, Evgeny Morozov wrote:
> On 9/05/2023 3:32 am, Andres Freund wrote:
> > Attached is a rough prototype of that idea (only using datconnlimit ==
> > -2 for now).
> > I guess we need to move this to -hackers. Perhaps I'll post subsequent
> > versions below
> > https:/
On Tue, May 16, 2023 at 10:20 AM Evgeny Morozov <
postgres...@realityexists.net> wrote:
> On 9/05/2023 3:32 am, Andres Freund wrote:
> > Attached is a rough prototype of that idea (only using datconnlimit ==
> > -2 for now).
> > I guess we need to move this to -hackers. Perhaps I'll post subsequen
On 9/05/2023 3:32 am, Andres Freund wrote:
> Attached is a rough prototype of that idea (only using datconnlimit ==
> -2 for now).
> I guess we need to move this to -hackers. Perhaps I'll post subsequent
> versions below
> https://www.postgresql.org/message-id/20230314174521.74jl6ffqsee5mtug%40awor
On Wed, May 10, 2023 at 9:32 AM Evgeny Morozov <
postgres...@realityexists.net> wrote:
> On 10/05/2023 6:39 am, Kirk Wolak wrote:
>
> It could be as simple as creating temp tables in the other database (since
> I believe pg_class was hit).
>
> We do indeed create temp tables, both in other databas
On 10/05/2023 6:39 am, Kirk Wolak wrote:
> It could be as simple as creating temp tables in the other database
> (since I believe pg_class was hit).
We do indeed create temp tables, both in other databases and in the ones
being tested. (We also create non-temp tables there.)
>
> Also, not sure if t
On Sun, May 7, 2023 at 10:18 PM Thomas Munro wrote:
> On Mon, May 8, 2023 at 4:10 AM Evgeny Morozov
> wrote:
> > On 6/05/2023 11:13 pm, Thomas Munro wrote:
> > > Would you like to try requesting FILE_COPY for a while and see if it
> eventually happens like that too?
> > Sure, we can try that.
>
On 8/05/2023 11:04 pm, Andres Freund wrote:
> Are you using any extensions?
Only plpgsql.
> Do you have any chance to figure out what statements were running
> concurrently with the DROP DATABASE?
No. Is there some way to log that, other than just logging all
statements (which seems impractical)?
On Tue, May 9, 2023 at 3:15 AM Michael Paquier wrote:
>
> On Mon, May 08, 2023 at 07:15:20PM +0530, Dilip Kumar wrote:
> > I am able to reproduce this using the steps given above, I am also
> > trying to analyze this further. I will send the update once I get
> > some clue.
>
> Have you been able
Hi,
On 2023-05-08 17:46:37 -0700, Andres Freund wrote:
> My current gut feeling is that we should use datconnlimit == -2 to prevent
> connections after reaching DropDatabaseBuffers() in dropdb(), and use a new
> column in 16, for both createdb() and dropdb().
Attached is a rough prototype of that
Hi,
On 2023-05-08 14:04:00 -0700, Andres Freund wrote:
> But perhaps a similar approach could be the solution? My gut says that the
> rought direction might allow us to keep dropdb() a single transaction.
I started to hack on the basic approach of committing after the catalog
changes. But then I
On Mon, May 08, 2023 at 06:04:23PM -0400, Tom Lane wrote:
> Andres seems to think it's a problem with aborting a DROP DATABASE.
> Adding more data might serve to make the window wider, perhaps.
And the odds get indeed much better once I use these two toys:
CREATE OR REPLACE FUNCTION create_tables(
On Tue, May 9, 2023 at 10:04 AM Tom Lane wrote:
> Michael Paquier writes:
> > One thing I was wondering about to improve the odds of the hits is to
> > be more aggressive with the number of relations created at once, so as
> > we are much more aggressive with the number of pages extended in
> > p
Michael Paquier writes:
> One thing I was wondering about to improve the odds of the hits is to
> be more aggressive with the number of relations created at once, so as
> we are much more aggressive with the number of pages extended in
> pg_class from the origin database.
Andres seems to think it
On Mon, May 08, 2023 at 07:15:20PM +0530, Dilip Kumar wrote:
> I am able to reproduce this using the steps given above, I am also
> trying to analyze this further. I will send the update once I get
> some clue.
Have you been able to reproduce this on HEAD or at the top of
REL_15_STABLE, or is tha
Hi,
On 2023-05-08 20:27:14 +, Evgeny Morozov wrote:
> On 8/05/2023 9:47 pm, Andres Freund wrote:
> > Did you have any occasions where CREATE or DROP DATABASE was interrupted?
> > Either due the connection being terminated or a crash?
>
> I've uploaded an edited version of the PG log for the ti
On 8/05/2023 9:47 pm, Andres Freund wrote:
> Did you have any occasions where CREATE or DROP DATABASE was interrupted?
> Either due the connection being terminated or a crash?
I've uploaded an edited version of the PG log for the time as
https://objective.realityexists.net/temp/log-extract-2023-05
Hi,
On 2023-05-07 16:10:28 +, Evgeny Morozov wrote:
> Yes, kind of. We have a test suite that creates one test DB and runs a
> bunch of tests on it. Two of these tests, however, create another DB
> each (also by cloning the same template DB) in order to test copying
> data between DBs. It's on
Alvaro Herrera writes:
> Maybe it would be sensible to make STRATEGY_FILE=FILE_COPY the default
> again, for branch 15, before today's release.
If we had more than one such report, I'd be in favor of that.
But I think it's a bit premature to conclude that the copy
strategy is to blame.
On Mon, May 8, 2023 at 7:55 AM Michael Paquier wrote:
>
> On Sun, May 07, 2023 at 10:30:52PM +1200, Thomas Munro wrote:
> > Bug-in-PostgreSQL explanations could include that we forgot it was
> > dirty, or some backend wrote it out to the wrong file; but if we were
> > forgetting something like per
On 2023-May-07, Thomas Munro wrote:
> Did you previously run this same workload on versions < 15 and never
> see any problem? 15 gained a new feature CREATE DATABASE ...
> STRATEGY=WAL_LOG, which is also the default. I wonder if there is a
> bug somewhere near that, though I have no specific ide
On 8/05/2023 4:24 am, Michael Paquier wrote:
> here are the four things running in parallel so as I can get a failure
> in loading a critical index when connecting
Wow, that is some amazing detective work! We do indeed create tables
during our tests, specifically partitions of tables copied from t
On Mon, May 08, 2023 at 02:46:37PM +1200, Thomas Munro wrote:
> That sounds like good news, but I'm still confused: do you see all 0s
> in the target database (popo)'s catalogs, as reported (and if so can
> you explain how they got there?), or is it regression that is
> corrupted in more subtle way
On Mon, May 8, 2023 at 2:24 PM Michael Paquier wrote:
> I can reproduce the same backtrace here. That's just my usual laptop
> with ext4, so this would be a Postgres bug. First, here are the four
> things running in parallel so as I can get a failure in loading a
> critical index when connecting
On Sun, May 07, 2023 at 10:30:52PM +1200, Thomas Munro wrote:
> Bug-in-PostgreSQL explanations could include that we forgot it was
> dirty, or some backend wrote it out to the wrong file; but if we were
> forgetting something like permanent or dirty, would there be a more
> systematic failure? Oh,
On Mon, May 8, 2023 at 4:10 AM Evgeny Morozov
wrote:
> On 6/05/2023 11:13 pm, Thomas Munro wrote:
> > Would you like to try requesting FILE_COPY for a while and see if it
> > eventually happens like that too?
> Sure, we can try that.
Maybe you could do some one way and some the other, so that we
On 6/05/2023 11:13 pm, Thomas Munro wrote:
> Did you previously run this same workload on versions < 15 and never
> see any problem?
Yes, kind of. We have a test suite that creates one test DB and runs a
bunch of tests on it. Two of these tests, however, create another DB
each (also by cloning the
On Sun, May 7, 2023 at 1:21 PM Tom Lane wrote:
> Thomas Munro writes:
> > Did you previously run this same workload on versions < 15 and never
> > see any problem? 15 gained a new feature CREATE DATABASE ...
> > STRATEGY=WAL_LOG, which is also the default. I wonder if there is a
> > bug somewhe
Thomas Munro writes:
> Did you previously run this same workload on versions < 15 and never
> see any problem? 15 gained a new feature CREATE DATABASE ...
> STRATEGY=WAL_LOG, which is also the default. I wonder if there is a
> bug somewhere near that, though I have no specific idea.
Per the rel
On Sun, May 7, 2023 at 10:23 AM Jeffrey Walton wrote:
> This may be related... I seem to recall the GNUlib folks talking about
> a cp bug on sparse files. It looks like it may be fixed in coreutils
> release 9.2 (2023-03-20):
> https://github.com/coreutils/coreutils/blob/master/NEWS#L233
>
> If I
On Sat, May 6, 2023 at 6:35 AM Thomas Munro wrote:
>
> On Sat, May 6, 2023 at 9:58 PM Evgeny Morozov
> wrote:
> > Right - I should have realised that! base/1414389/2662 is indeed all
> > nulls, 32KB of them. I included the file anyway in
> > https://objective.realityexists.net/temp/pgstuff2.zip
>
On Sun, May 7, 2023 at 12:29 AM Evgeny Morozov
wrote:
> On 6/05/2023 12:34 pm, Thomas Munro wrote:
> > So it does indeed look like something unknown has replaced 32KB of
> > data with 32KB of zeroes underneath us. Are there more non-empty
> > files that are all-zeroes? Something like this might
On 6/05/2023 12:34 pm, Thomas Munro wrote:
> So it does indeed look like something unknown has replaced 32KB of
> data with 32KB of zeroes underneath us. Are there more non-empty
> files that are all-zeroes? Something like this might find them:
>
> for F in base/1414389/*
> do
> if [ -s $F ] &&
On Sat, May 6, 2023 at 9:58 PM Evgeny Morozov
wrote:
> Right - I should have realised that! base/1414389/2662 is indeed all
> nulls, 32KB of them. I included the file anyway in
> https://objective.realityexists.net/temp/pgstuff2.zip
OK so it's not just page 0, you have 32KB or 4 pages of all zero
On 6/05/2023 1:06 am, Thomas Munro wrote:
> Next can you share the file base/1414389/2662? ("5" was from the wrong
> database.)
Right - I should have realised that! base/1414389/2662 is indeed all
nulls, 32KB of them. I included the file anyway in
https://objective.realityexists.net/temp/pgstuff2
On Fri, May 5, 2023 at 7:50 PM Evgeny Morozov
wrote:
> The OID of the bad DB ('test_behavior_638186279733138190') is 1414389 and
> I've uploaded base/1414389/pg_filenode.map and also base/5/2662 (in case
> that's helpful) as https://objective.realityexists.net/temp/pgstuff1.zip
Thanks. That pg
On 5/05/2023 10:38 am, Andrew Gierth wrote:
> sudo -u postgres psql -w -p 5434 -d "options='-P'"
> (make that -d "dbname=whatever options='-P'" if you need to specify
> some database name; or use PGOPTIONS="-P" in the environment.)
Thanks, good to know! Unfortunately that also fails:
# sudo -u p
> "Evgeny" == Evgeny Morozov writes:
Evgeny> Indeed, I cannot get that far due to the same error. I read
Evgeny> about ignore_system_indexes, but...
Evgeny> # sudo -u postgres psql -w -p 5434 -c "set ignore_system_indexes=on";
Evgeny> ERROR: parameter "ignore_system_indexes" cannot be s
On 5/05/2023 2:02 am, Thomas Munro wrote:
> On Fri, May 5, 2023 at 11:15 AM Thomas Munro wrote:
>> What does select
>> pg_relation_filepath('pg_class_oid_index') show in the corrupted
>> database, base/5/2662 or something else?
> Oh, you can't get that far, but perhaps you could share the
> pg_fil
On Fri, May 5, 2023 at 11:15 AM Thomas Munro wrote:
> What does select
> pg_relation_filepath('pg_class_oid_index') show in the corrupted
> database, base/5/2662 or something else?
Oh, you can't get that far, but perhaps you could share the
pg_filenode.map file? Or alternatively strace -f Postgr
On Fri, May 5, 2023 at 11:15 AM Thomas Munro wrote:
> Now *that* is a piece of
> logic that changed in PostgreSQL 15. It changed from sector-based
> atomicity assumptions to a directory entry swizzling trick, in commit
> d8cd0c6c95c0120168df93aae095df4e0682a08a. Hmm.
I spoke too soon, that only
On Fri, May 5, 2023 at 6:11 AM Evgeny Morozov
wrote:
> Meanwhile, what do I do with the existing server, though? Just try to
> drop the problematic DBs again manually?
That earlier link to a FreeBSD thread is surely about bleeding edge
new ZFS stuff that was briefly broken then fixed, being disco
On 5/4/23 13:10, Evgeny Morozov wrote:
[snip]
I'm now thinking of setting up a dedicated AWS EC2 instance just for
these little DBs that get created by our automated tests. If the problem
happens there as well then that would strongly point towards a bug in
PostgreSQL, wouldn't it?
Many other p
On 4/05/2023 6:42 pm, Laurenz Albe wrote:
> On Thu, 2023-05-04 at 15:49 +, Evgeny Morozov wrote:
>> Well, the problem happened again! Kind of... This time PG has not
>> crashed with the PANIC error in the subject, but pg_dumping certain DBs
>> again fails with
>>
>>
>> pg_dump: error: connectio
On Thu, 2023-05-04 at 15:49 +, Evgeny Morozov wrote:
> Well, the problem happened again! Kind of... This time PG has not
> crashed with the PANIC error in the subject, but pg_dumping certain DBs
> again fails with
>
>
> pg_dump: error: connection to server on socket
> "/var/run/postgresql/.s.
On 14/04/2023 10:42 am, Alban Hertroys wrote:
> Your problem coincides with a thread at freebsd-current with very
> similar data corruption after a recent OpenZFS import: blocks of all
> zeroes, but also missing files. So, perhaps these problems are related?
> Apparently, there was a recent fix for
> On 14 Apr 2023, at 9:38, Evgeny Morozov wrote:
(…)
> I don't know whether ZFS zero-fills blocks on disk errors. As I
> understood, ZFS should have been able to recover from disk errors (that
> were "unrecoverable" at the hardware level) using the data on the other
> two disks (which did not
> Hmm, I am not certain. The block was filled with zeros from your error
> message, and I think such blocks don't trigger a checksum warning.
OK, so data_checksums=on might not have made any difference in this case?
> So if your disk replaces a valid block with zeros (filesystem check
> after cr
On Fri, 14 Apr 2023, Laurenz Albe wrote:
>So if your disk replaces a valid block with zeros (filesystem check
>after crash?), that could explain what you see.
Oh, I had that happen on a RAID 1 once. On of the two discs had an
intermittent error (write I guess) but didn’t fail out of the RAID,
and
On Thu, 2023-04-13 at 19:07 +, Evgeny Morozov wrote:
> On 13/04/2023 5:02 pm, Laurenz Albe wrote:
> > It means that if the error is caused by a faulty disk changing your data,
> > you'll notice as soon as you touch the page.
> >
> > That would perhaps not have made a lot of difference in your
On 13/04/2023 5:02 pm, Laurenz Albe wrote:
> It means that if the error is caused by a faulty disk changing your data,
> you'll notice as soon as you touch the page.
>
> That would perhaps not have made a lot of difference in your case,
> except that the error message would have been different and
On Thu, 2023-04-13 at 06:56 +, Evgeny Morozov wrote:
> On 12/04/2023 2:35 am, Michael Paquier wrote:
> > initdb does not enable checksums by default, requiring a
> > -k/--data-checksums, so likely this addition comes from from your
> > environment.
>
> OK, so then what does that mean for the e
On 12/04/2023 2:35 am, Michael Paquier wrote:
> initdb does not enable checksums by default, requiring a
> -k/--data-checksums, so likely this addition comes from from your
> environment.
Indeed, turns out we had it in init_db_options.
> However, the docs say "Only
>> data pages are protected by
On Tue, Apr 11, 2023 at 04:44:54PM +, Evgeny Morozov wrote:
> We have data_checksums=on. (It must be on by default, since I cannot
> find that in our config files anywhere.)
initdb does not enable checksums by default, requiring a
-k/--data-checksums, so likely this addition comes from from yo
> No idea about the former, but bad hardware is a good enough explanation.
> As to keeping it from happening: use good hardware.
Alright, thanks, I'll just keep my fingers crossed that it doesn't
happen again then!
> Also: Use checksums. PostgreSQL offers data checksums[1]. Some
filesystems also
On 2023-04-07 13:04:34 +0200, Laurenz Albe wrote:
> On Thu, 2023-04-06 at 16:41 +, Evgeny Morozov wrote:
> > What can I do to figure out why this is happening and prevent it from
> > happening again?
>
> No idea about the former, but bad hardware is a good enough explanation.
>
> As to keepi
On Fri, Apr 07, 2023 at 01:04:34PM +0200, Laurenz Albe wrote:
> On Thu, 2023-04-06 at 16:41 +, Evgeny Morozov wrote:
>> Could this be a PG bug?
>
> It could be, but data corruption caused by bad hardware is much more likely.
There is no way to be completely sure here, except if we would be ab
On Thu, 2023-04-06 at 16:41 +, Evgeny Morozov wrote:
> Our PostgreSQL 15.2 instance running on Ubuntu 18.04 has crashed with this
> error:
>
> 2023-04-05 09:24:03.448 UTC [15227] ERROR: index "pg_class_oid_index"
> contains unexpected zero page at block 0
> [...]
>
> We had the same thin
Our PostgreSQL 15.2 instance running on Ubuntu 18.04 has crashed with
this error:
2023-04-05 09:24:03.448 UTC [15227] ERROR: index "pg_class_oid_index"
contains unexpected zero page at block 0
2023-04-05 09:24:03.448 UTC [15227] HINT: Please REINDEX it.
...
2023-04-05 13:05:25.018 UTC [15437]
ro
61 matches
Mail list logo