Re: "PANIC: could not open critical system index 2662" - twice

2023-07-13 Thread Andres Freund
Hi, On 2023-06-19 10:04:35 +, Evgeny Morozov wrote: > There haven't been any updates posted to > https://www.postgresql.org/message-id/20230509040203.z6mvijumv7wxcuib%40awork3.anarazel.de > so I just wanted to check if there is any update on the status of the > patch? Can we expect it in Postg

Re: "PANIC: could not open critical system index 2662" - twice

2023-06-19 Thread Evgeny Morozov
There haven't been any updates posted to https://www.postgresql.org/message-id/20230509040203.z6mvijumv7wxcuib%40awork3.anarazel.de so I just wanted to check if there is any update on the status of the patch? Can we expect it in PostgreSQL 15.4? Thanks.

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-17 Thread Evgeny Morozov
On 17/05/2023 1:39 am, Andres Freund wrote: > Try to prevent the DROP DATABASE from getting cancelled :/. I still don't know why that's happening. I mean, I know why it gets cancelled (the client timeout we set in Npgsql), but I don't know why the drop does not succeed within 30 seconds. We could,

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-16 Thread Andres Freund
Hi, On 2023-05-16 14:20:46 +, Evgeny Morozov wrote: > On 9/05/2023 3:32 am, Andres Freund wrote: > > Attached is a rough prototype of that idea (only using datconnlimit == > > -2 for now). > > I guess we need to move this to -hackers. Perhaps I'll post subsequent > > versions below > > https:/

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-16 Thread Kirk Wolak
On Tue, May 16, 2023 at 10:20 AM Evgeny Morozov < postgres...@realityexists.net> wrote: > On 9/05/2023 3:32 am, Andres Freund wrote: > > Attached is a rough prototype of that idea (only using datconnlimit == > > -2 for now). > > I guess we need to move this to -hackers. Perhaps I'll post subsequen

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-16 Thread Evgeny Morozov
On 9/05/2023 3:32 am, Andres Freund wrote: > Attached is a rough prototype of that idea (only using datconnlimit == > -2 for now). > I guess we need to move this to -hackers. Perhaps I'll post subsequent > versions below > https://www.postgresql.org/message-id/20230314174521.74jl6ffqsee5mtug%40awor

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-11 Thread Kirk Wolak
On Wed, May 10, 2023 at 9:32 AM Evgeny Morozov < postgres...@realityexists.net> wrote: > On 10/05/2023 6:39 am, Kirk Wolak wrote: > > It could be as simple as creating temp tables in the other database (since > I believe pg_class was hit). > > We do indeed create temp tables, both in other databas

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-10 Thread Evgeny Morozov
On 10/05/2023 6:39 am, Kirk Wolak wrote: > It could be as simple as creating temp tables in the other database > (since I believe pg_class was hit). We do indeed create temp tables, both in other databases and in the ones being tested. (We also create non-temp tables there.) > > Also, not sure if t

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-09 Thread Kirk Wolak
On Sun, May 7, 2023 at 10:18 PM Thomas Munro wrote: > On Mon, May 8, 2023 at 4:10 AM Evgeny Morozov > wrote: > > On 6/05/2023 11:13 pm, Thomas Munro wrote: > > > Would you like to try requesting FILE_COPY for a while and see if it > eventually happens like that too? > > Sure, we can try that. >

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-09 Thread Evgeny Morozov
On 8/05/2023 11:04 pm, Andres Freund wrote: > Are you using any extensions? Only plpgsql. > Do you have any chance to figure out what statements were running > concurrently with the DROP DATABASE? No. Is there some way to log that, other than just logging all statements (which seems impractical)?

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Dilip Kumar
On Tue, May 9, 2023 at 3:15 AM Michael Paquier wrote: > > On Mon, May 08, 2023 at 07:15:20PM +0530, Dilip Kumar wrote: > > I am able to reproduce this using the steps given above, I am also > > trying to analyze this further. I will send the update once I get > > some clue. > > Have you been able

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Andres Freund
Hi, On 2023-05-08 17:46:37 -0700, Andres Freund wrote: > My current gut feeling is that we should use datconnlimit == -2 to prevent > connections after reaching DropDatabaseBuffers() in dropdb(), and use a new > column in 16, for both createdb() and dropdb(). Attached is a rough prototype of that

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Andres Freund
Hi, On 2023-05-08 14:04:00 -0700, Andres Freund wrote: > But perhaps a similar approach could be the solution? My gut says that the > rought direction might allow us to keep dropdb() a single transaction. I started to hack on the basic approach of committing after the catalog changes. But then I

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Michael Paquier
On Mon, May 08, 2023 at 06:04:23PM -0400, Tom Lane wrote: > Andres seems to think it's a problem with aborting a DROP DATABASE. > Adding more data might serve to make the window wider, perhaps. And the odds get indeed much better once I use these two toys: CREATE OR REPLACE FUNCTION create_tables(

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Thomas Munro
On Tue, May 9, 2023 at 10:04 AM Tom Lane wrote: > Michael Paquier writes: > > One thing I was wondering about to improve the odds of the hits is to > > be more aggressive with the number of relations created at once, so as > > we are much more aggressive with the number of pages extended in > > p

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Tom Lane
Michael Paquier writes: > One thing I was wondering about to improve the odds of the hits is to > be more aggressive with the number of relations created at once, so as > we are much more aggressive with the number of pages extended in > pg_class from the origin database. Andres seems to think it

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Michael Paquier
On Mon, May 08, 2023 at 07:15:20PM +0530, Dilip Kumar wrote: > I am able to reproduce this using the steps given above, I am also > trying to analyze this further. I will send the update once I get > some clue. Have you been able to reproduce this on HEAD or at the top of REL_15_STABLE, or is tha

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Andres Freund
Hi, On 2023-05-08 20:27:14 +, Evgeny Morozov wrote: > On 8/05/2023 9:47 pm, Andres Freund wrote: > > Did you have any occasions where CREATE or DROP DATABASE was interrupted? > > Either due the connection being terminated or a crash? > > I've uploaded an edited version of the PG log for the ti

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Evgeny Morozov
On 8/05/2023 9:47 pm, Andres Freund wrote: > Did you have any occasions where CREATE or DROP DATABASE was interrupted? > Either due the connection being terminated or a crash? I've uploaded an edited version of the PG log for the time as https://objective.realityexists.net/temp/log-extract-2023-05

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Andres Freund
Hi, On 2023-05-07 16:10:28 +, Evgeny Morozov wrote: > Yes, kind of. We have a test suite that creates one test DB and runs a > bunch of tests on it. Two of these tests, however, create another DB > each (also by cloning the same template DB) in order to test copying > data between DBs. It's on

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Tom Lane
Alvaro Herrera writes: > Maybe it would be sensible to make STRATEGY_FILE=FILE_COPY the default > again, for branch 15, before today's release. If we had more than one such report, I'd be in favor of that. But I think it's a bit premature to conclude that the copy strategy is to blame.

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Dilip Kumar
On Mon, May 8, 2023 at 7:55 AM Michael Paquier wrote: > > On Sun, May 07, 2023 at 10:30:52PM +1200, Thomas Munro wrote: > > Bug-in-PostgreSQL explanations could include that we forgot it was > > dirty, or some backend wrote it out to the wrong file; but if we were > > forgetting something like per

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Alvaro Herrera
On 2023-May-07, Thomas Munro wrote: > Did you previously run this same workload on versions < 15 and never > see any problem? 15 gained a new feature CREATE DATABASE ... > STRATEGY=WAL_LOG, which is also the default. I wonder if there is a > bug somewhere near that, though I have no specific ide

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-08 Thread Evgeny Morozov
On 8/05/2023 4:24 am, Michael Paquier wrote: > here are the four things running in parallel so as I can get a failure > in loading a critical index when connecting Wow, that is some amazing detective work! We do indeed create tables during our tests, specifically partitions of tables copied from t

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-07 Thread Michael Paquier
On Mon, May 08, 2023 at 02:46:37PM +1200, Thomas Munro wrote: > That sounds like good news, but I'm still confused: do you see all 0s > in the target database (popo)'s catalogs, as reported (and if so can > you explain how they got there?), or is it regression that is > corrupted in more subtle way

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-07 Thread Thomas Munro
On Mon, May 8, 2023 at 2:24 PM Michael Paquier wrote: > I can reproduce the same backtrace here. That's just my usual laptop > with ext4, so this would be a Postgres bug. First, here are the four > things running in parallel so as I can get a failure in loading a > critical index when connecting

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-07 Thread Michael Paquier
On Sun, May 07, 2023 at 10:30:52PM +1200, Thomas Munro wrote: > Bug-in-PostgreSQL explanations could include that we forgot it was > dirty, or some backend wrote it out to the wrong file; but if we were > forgetting something like permanent or dirty, would there be a more > systematic failure? Oh,

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-07 Thread Thomas Munro
On Mon, May 8, 2023 at 4:10 AM Evgeny Morozov wrote: > On 6/05/2023 11:13 pm, Thomas Munro wrote: > > Would you like to try requesting FILE_COPY for a while and see if it > > eventually happens like that too? > Sure, we can try that. Maybe you could do some one way and some the other, so that we

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-07 Thread Evgeny Morozov
On 6/05/2023 11:13 pm, Thomas Munro wrote: > Did you previously run this same workload on versions < 15 and never > see any problem? Yes, kind of. We have a test suite that creates one test DB and runs a bunch of tests on it. Two of these tests, however, create another DB each (also by cloning the

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-07 Thread Thomas Munro
On Sun, May 7, 2023 at 1:21 PM Tom Lane wrote: > Thomas Munro writes: > > Did you previously run this same workload on versions < 15 and never > > see any problem? 15 gained a new feature CREATE DATABASE ... > > STRATEGY=WAL_LOG, which is also the default. I wonder if there is a > > bug somewhe

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-06 Thread Tom Lane
Thomas Munro writes: > Did you previously run this same workload on versions < 15 and never > see any problem? 15 gained a new feature CREATE DATABASE ... > STRATEGY=WAL_LOG, which is also the default. I wonder if there is a > bug somewhere near that, though I have no specific idea. Per the rel

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-06 Thread Thomas Munro
On Sun, May 7, 2023 at 10:23 AM Jeffrey Walton wrote: > This may be related... I seem to recall the GNUlib folks talking about > a cp bug on sparse files. It looks like it may be fixed in coreutils > release 9.2 (2023-03-20): > https://github.com/coreutils/coreutils/blob/master/NEWS#L233 > > If I

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-06 Thread Jeffrey Walton
On Sat, May 6, 2023 at 6:35 AM Thomas Munro wrote: > > On Sat, May 6, 2023 at 9:58 PM Evgeny Morozov > wrote: > > Right - I should have realised that! base/1414389/2662 is indeed all > > nulls, 32KB of them. I included the file anyway in > > https://objective.realityexists.net/temp/pgstuff2.zip >

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-06 Thread Thomas Munro
On Sun, May 7, 2023 at 12:29 AM Evgeny Morozov wrote: > On 6/05/2023 12:34 pm, Thomas Munro wrote: > > So it does indeed look like something unknown has replaced 32KB of > > data with 32KB of zeroes underneath us. Are there more non-empty > > files that are all-zeroes? Something like this might

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-06 Thread Evgeny Morozov
On 6/05/2023 12:34 pm, Thomas Munro wrote: > So it does indeed look like something unknown has replaced 32KB of > data with 32KB of zeroes underneath us. Are there more non-empty > files that are all-zeroes? Something like this might find them: > > for F in base/1414389/* > do > if [ -s $F ] &&

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-06 Thread Thomas Munro
On Sat, May 6, 2023 at 9:58 PM Evgeny Morozov wrote: > Right - I should have realised that! base/1414389/2662 is indeed all > nulls, 32KB of them. I included the file anyway in > https://objective.realityexists.net/temp/pgstuff2.zip OK so it's not just page 0, you have 32KB or 4 pages of all zero

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-06 Thread Evgeny Morozov
On 6/05/2023 1:06 am, Thomas Munro wrote: > Next can you share the file base/1414389/2662? ("5" was from the wrong > database.) Right - I should have realised that! base/1414389/2662 is indeed all nulls, 32KB of them. I included the file anyway in https://objective.realityexists.net/temp/pgstuff2

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-05 Thread Thomas Munro
On Fri, May 5, 2023 at 7:50 PM Evgeny Morozov wrote: > The OID of the bad DB ('test_behavior_638186279733138190') is 1414389 and > I've uploaded base/1414389/pg_filenode.map and also base/5/2662 (in case > that's helpful) as https://objective.realityexists.net/temp/pgstuff1.zip Thanks. That pg

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-05 Thread Evgeny Morozov
On 5/05/2023 10:38 am, Andrew Gierth wrote: > sudo -u postgres psql -w -p 5434 -d "options='-P'" > (make that -d "dbname=whatever options='-P'" if you need to specify > some database name; or use PGOPTIONS="-P" in the environment.) Thanks, good to know! Unfortunately that also fails: # sudo -u p

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-05 Thread Andrew Gierth
> "Evgeny" == Evgeny Morozov writes: Evgeny> Indeed, I cannot get that far due to the same error. I read Evgeny> about ignore_system_indexes, but... Evgeny> # sudo -u postgres psql -w -p 5434 -c "set ignore_system_indexes=on"; Evgeny> ERROR:  parameter "ignore_system_indexes" cannot be s

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-05 Thread Evgeny Morozov
On 5/05/2023 2:02 am, Thomas Munro wrote: > On Fri, May 5, 2023 at 11:15 AM Thomas Munro wrote: >> What does select >> pg_relation_filepath('pg_class_oid_index') show in the corrupted >> database, base/5/2662 or something else? > Oh, you can't get that far, but perhaps you could share the > pg_fil

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-04 Thread Thomas Munro
On Fri, May 5, 2023 at 11:15 AM Thomas Munro wrote: > What does select > pg_relation_filepath('pg_class_oid_index') show in the corrupted > database, base/5/2662 or something else? Oh, you can't get that far, but perhaps you could share the pg_filenode.map file? Or alternatively strace -f Postgr

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-04 Thread Thomas Munro
On Fri, May 5, 2023 at 11:15 AM Thomas Munro wrote: > Now *that* is a piece of > logic that changed in PostgreSQL 15. It changed from sector-based > atomicity assumptions to a directory entry swizzling trick, in commit > d8cd0c6c95c0120168df93aae095df4e0682a08a. Hmm. I spoke too soon, that only

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-04 Thread Thomas Munro
On Fri, May 5, 2023 at 6:11 AM Evgeny Morozov wrote: > Meanwhile, what do I do with the existing server, though? Just try to > drop the problematic DBs again manually? That earlier link to a FreeBSD thread is surely about bleeding edge new ZFS stuff that was briefly broken then fixed, being disco

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-04 Thread Ron
On 5/4/23 13:10, Evgeny Morozov wrote: [snip] I'm now thinking of setting up a dedicated AWS EC2 instance just for these little DBs that get created by our automated tests. If the problem happens there as well then that would strongly point towards a bug in PostgreSQL, wouldn't it? Many other p

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-04 Thread Evgeny Morozov
On 4/05/2023 6:42 pm, Laurenz Albe wrote: > On Thu, 2023-05-04 at 15:49 +, Evgeny Morozov wrote: >> Well, the problem happened again! Kind of... This time PG has not >> crashed with the PANIC error in the subject, but pg_dumping certain DBs >> again fails with >> >> >> pg_dump: error: connectio

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-04 Thread Laurenz Albe
On Thu, 2023-05-04 at 15:49 +, Evgeny Morozov wrote: > Well, the problem happened again! Kind of... This time PG has not > crashed with the PANIC error in the subject, but pg_dumping certain DBs > again fails with > > > pg_dump: error: connection to server on socket > "/var/run/postgresql/.s.

Re: "PANIC: could not open critical system index 2662" - twice

2023-05-04 Thread Evgeny Morozov
On 14/04/2023 10:42 am, Alban Hertroys wrote: > Your problem coincides with a thread at freebsd-current with very > similar data corruption after a recent OpenZFS import: blocks of all > zeroes, but also missing files. So, perhaps these problems are related? > Apparently, there was a recent fix for

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-14 Thread Alban Hertroys
> On 14 Apr 2023, at 9:38, Evgeny Morozov wrote: (…) > I don't know whether ZFS zero-fills blocks on disk errors. As I > understood, ZFS should have been able to recover from disk errors (that > were "unrecoverable" at the hardware level) using the data on the other > two disks (which did not

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-14 Thread Evgeny Morozov
> Hmm, I am not certain. The block was filled with zeros from your error > message, and I think such blocks don't trigger a checksum warning. OK, so data_checksums=on might not have made any difference in this case? > So if your disk replaces a valid block with zeros (filesystem check > after cr

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-13 Thread Thorsten Glaser
On Fri, 14 Apr 2023, Laurenz Albe wrote: >So if your disk replaces a valid block with zeros (filesystem check >after crash?), that could explain what you see. Oh, I had that happen on a RAID 1 once. On of the two discs had an intermittent error (write I guess) but didn’t fail out of the RAID, and

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-13 Thread Laurenz Albe
On Thu, 2023-04-13 at 19:07 +, Evgeny Morozov wrote: > On 13/04/2023 5:02 pm, Laurenz Albe wrote: > > It means that if the error is caused by a faulty disk changing your data, > > you'll notice as soon as you touch the page. > > > > That would perhaps not have made a lot of difference in your

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-13 Thread Evgeny Morozov
On 13/04/2023 5:02 pm, Laurenz Albe wrote: > It means that if the error is caused by a faulty disk changing your data, > you'll notice as soon as you touch the page. > > That would perhaps not have made a lot of difference in your case, > except that the error message would have been different and

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-13 Thread Laurenz Albe
On Thu, 2023-04-13 at 06:56 +, Evgeny Morozov wrote: > On 12/04/2023 2:35 am, Michael Paquier wrote: > > initdb does not enable checksums by default, requiring a > > -k/--data-checksums, so likely this addition comes from from your > > environment. > > OK, so then what does that mean for the e

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-12 Thread Evgeny Morozov
On 12/04/2023 2:35 am, Michael Paquier wrote: > initdb does not enable checksums by default, requiring a > -k/--data-checksums, so likely this addition comes from from your > environment. Indeed, turns out we had it in init_db_options. > However, the docs say "Only >> data pages are protected by

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-11 Thread Michael Paquier
On Tue, Apr 11, 2023 at 04:44:54PM +, Evgeny Morozov wrote: > We have data_checksums=on. (It must be on by default, since I cannot > find that in our config files anywhere.) initdb does not enable checksums by default, requiring a -k/--data-checksums, so likely this addition comes from from yo

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-11 Thread Evgeny Morozov
> No idea about the former, but bad hardware is a good enough explanation. > As to keeping it from happening: use good hardware. Alright, thanks, I'll just keep my fingers crossed that it doesn't happen again then! > Also: Use checksums. PostgreSQL offers data checksums[1]. Some filesystems also

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-07 Thread Peter J. Holzer
On 2023-04-07 13:04:34 +0200, Laurenz Albe wrote: > On Thu, 2023-04-06 at 16:41 +, Evgeny Morozov wrote: > > What can I do to figure out why this is happening and prevent it from > > happening again? > > No idea about the former, but bad hardware is a good enough explanation. > > As to keepi

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-07 Thread Michael Paquier
On Fri, Apr 07, 2023 at 01:04:34PM +0200, Laurenz Albe wrote: > On Thu, 2023-04-06 at 16:41 +, Evgeny Morozov wrote: >> Could this be a PG bug? > > It could be, but data corruption caused by bad hardware is much more likely. There is no way to be completely sure here, except if we would be ab

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-07 Thread Laurenz Albe
On Thu, 2023-04-06 at 16:41 +, Evgeny Morozov wrote: >  Our PostgreSQL 15.2 instance running on Ubuntu 18.04 has crashed with this > error: > > 2023-04-05 09:24:03.448 UTC [15227] ERROR:  index "pg_class_oid_index" > contains unexpected zero page at block 0 > [...] > > We had the same thin

"PANIC: could not open critical system index 2662" - twice

2023-04-06 Thread Evgeny Morozov
Our PostgreSQL 15.2 instance running on Ubuntu 18.04 has crashed with this error: 2023-04-05 09:24:03.448 UTC [15227] ERROR:  index "pg_class_oid_index" contains unexpected zero page at block 0 2023-04-05 09:24:03.448 UTC [15227] HINT:  Please REINDEX it. ... 2023-04-05 13:05:25.018 UTC [15437] ro