Re: Adding REPACK [concurrently]

2025-09-03 Thread Antonin Houska
Mihail Nikalayeu wrote: > While testing MVCC-safe version with stress-tests > 007_repack_concurrently_mvcc.pl I encountered some random crashes with > such logs: > > 25-09-02 12:24:40.039 CEST client backend[261907] > 007_repack_concurrently_mvcc.pl ERROR: relcache reference > 0x7715b9f394a8 is

Re: Adding REPACK [concurrently]

2025-09-03 Thread Mihail Nikalayeu
Hello, Álvaro! Alvaro Herrera : > If you want to post secondary patches, please rename them to end in > something like .txt or .nocfbot or whatever. See here: > https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches? Sorry, I missed that. But now it is possible to

Re: Adding REPACK [concurrently]

2025-09-03 Thread Mihail Nikalayeu
Hello! Antonin Houska : > Are you sure the test is complete? I see no occurrence of the REPACK command > in it. Oops, send invalid file. The correct one in attachment. Add_test_for_REPACK_CONCURRENTLY_with_concurrent_modifications.patch_ Description: Binary data

Re: Adding REPACK [concurrently]

2025-09-01 Thread Antonin Houska
Mihail Nikalayeu wrote: > Antonin Houska : > > Are you sure the test is complete? I see no occurrence of the REPACK command > > in it. > Oops, send invalid file. The correct one in attachment. Thanks! The problem was that when removing the original "preserve visibility patch" v12-0005 [1] from

Re: Adding REPACK [concurrently]

2025-08-31 Thread Antonin Houska
Mihail Nikalayeu wrote: > Hello! > > I started an attempt to make a "lightweight" MVCC-safe prototype and > stuck into the "it is not working" issue. > After some debugging I realized Antonin's variant (catalog-mode based) > seems to be broken also... > > And after a few more hours I realized n

Re: Adding REPACK [concurrently]

2025-08-31 Thread Michael Paquier
On Wed, Aug 27, 2025 at 10:22:24AM +0200, Mihail Nikalayeu wrote: > That worries me - it is not the behaviour someone expects from a > database by default. At least the warning should be much more visible > and obvious. > I think most of user will expect the same guarantees as [CREATE|RE] > INDEX C

Re: Adding REPACK [concurrently]

2025-08-31 Thread Alvaro Herrera
On 2025-Aug-31, Mihail Nikalayeu wrote: > I started an attempt to make a "lightweight" MVCC-safe prototype and > stuck into the "it is not working" issue. > After some debugging I realized Antonin's variant (catalog-mode based) > seems to be broken also... > > And after a few more hours I realize

Re: Adding REPACK [concurrently]

2025-08-31 Thread Mihail Nikalayeu
Hello! I started an attempt to make a "lightweight" MVCC-safe prototype and stuck into the "it is not working" issue. After some debugging I realized Antonin's variant (catalog-mode based) seems to be broken also... And after a few more hours I realized non-MVCC is broken as well :) This is a pa

Re: Adding REPACK [concurrently]

2025-08-29 Thread Antonin Houska
Mihail Nikalayeu wrote: > In case of some incident related to that (in a large well-known > company) the typical takeaway for readers of tech blogs will simply be > "some command in Postgres is broken". For _responsible_ users, the message will rather be that "some tech bloggers do not bother to

Re: Adding REPACK [concurrently]

2025-08-28 Thread Mihail Nikalayeu
Hello, Álvaro! > If others are motivated enough to certify it, maybe we can consider it. > But I don't think I'm going to have time to get this part reviewed and > committed in time for 19, so you might need another committer. I don't think it is realistic to involve another committer - it is jus

Re: Adding REPACK [concurrently]

2025-08-28 Thread Alvaro Herrera
On 2025-Aug-21, Mihail Nikalayeu wrote: > Alvaro Herrera : > > I proposed to leave this part out initially, which is why it hasn't been > > reposted. We can review and discuss after the initial patches are in. > > I think it is worth pushing it at least in the same release cycle. If others are

Re: Adding REPACK [concurrently]

2025-08-27 Thread Mihail Nikalayeu
Antonin Houska : > Do you mean the race related to TransactionIdIsInProgress()? Not sure I > understand, as you suggested above that you no longer need the function. The "lightweight" approaches I see so far: * XactLockTableWait before replay + SnapshotSelf(GetLatestSnapshot?) * SnapshotDirty + r

Re: Adding REPACK [concurrently]

2025-08-27 Thread Antonin Houska
Mihail Nikalayeu wrote: > > A new kind of snapshot seems like (much) cleaner solution at the moment. > > Do you mean some kind of snapshot which only uses > TransactionIdDidCommit/Abort ignoring > TransactionIdIsCurrentTransactionId/TransactionIdIsInProgress? > Actually it behaves like SnapshotB

Re: Adding REPACK [concurrently]

2025-08-27 Thread Mihail Nikalayeu
Antonin Houska : > I insist that this is a misuse of TransactionIdIsInProgress(). When dealing > with logical decoding, only WAL should tell whether particular transaction is > still running. AFAICS this is how reorderbuffer.c works. Hm... Maybe, but at the same time we already have SnapshotDirty

Re: Adding REPACK [concurrently]

2025-08-26 Thread Antonin Houska
Mihail Nikalayeu wrote: > Hello, Antonin! > > Antonin Houska : > > > > Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is > > visible? TransactionIdIsCurrentTransactionId() will not do w/o the > > modifications that you proposed earlier [1] and TransactionIdIsInProgress()

Re: Adding REPACK [concurrently]

2025-08-26 Thread Mihail Nikalayeu
Hello, Antonin! Antonin Houska : > > Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is > visible? TransactionIdIsCurrentTransactionId() will not do w/o the > modifications that you proposed earlier [1] and TransactionIdIsInProgress() is > not suitable as I explained in [2].

Re: Adding REPACK [concurrently]

2025-08-26 Thread Antonin Houska
Mihail Nikalayeu wrote: > Antonin Houska : > > > Although it could work, I think it'd be confusing to consider the > > transactions > > being replayed as "current" from the point of view of the backend that > > executes REPACK CONCURRENTLY. > > Just realized SnapshotDirty is the thing that fit

Re: Adding REPACK [concurrently]

2025-08-26 Thread Mihail Nikalayeu
Antonin Houska : > Although it could work, I think it'd be confusing to consider the transactions > being replayed as "current" from the point of view of the backend that > executes REPACK CONCURRENTLY. Just realized SnapshotDirty is the thing that fits into the role - it respects not-yet committ

Re: Adding REPACK [concurrently]

2025-08-26 Thread Antonin Houska
Mihail Nikalayeu wrote: > Antonin Houska : > > I think the problem is that HeapTupleSatisfiesSelf() uses > > TransactionIdIsInProgress() instead of checking the snapshot: > > Yes, some issues might be possible for SnapshotSelf. > Possible solution is to override TransactionIdIsCurrentTransaction

Re: Adding REPACK [concurrently]

2025-08-25 Thread Mihail Nikalayeu
Antonin Houska : > I think the problem is that HeapTupleSatisfiesSelf() uses > TransactionIdIsInProgress() instead of checking the snapshot: Yes, some issues might be possible for SnapshotSelf. Possible solution is to override TransactionIdIsCurrentTransactionId to true (like you did with nParalle

Re: Adding REPACK [concurrently]

2025-08-25 Thread Antonin Houska
Mihail Nikalayeu wrote: > Hi, Antonin > > > How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using > > a > > snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ? > > Yes, TransactionIdDidCommit. I think the problem is that HeapTupleSatisfiesSelf() u

Re: Adding REPACK [concurrently]

2025-08-25 Thread Antonin Houska
Robert Treat wrote: > On Mon, Aug 25, 2025 at 10:15 AM Mihail Nikalayeu > wrote: > > 1) we have a whole initial table snapshot with all xmin copied from > > the original table. All such xmin are committed. > > 2) appling transaction sees ALL the self-alive (no xmax) tuple in it > > because its x

Re: Adding REPACK [concurrently]

2025-08-25 Thread Robert Treat
On Mon, Aug 25, 2025 at 10:15 AM Mihail Nikalayeu wrote: > 1) we have a whole initial table snapshot with all xmin copied from > the original table. All such xmin are committed. > 2) appling transaction sees ALL the self-alive (no xmax) tuple in it > because its xmin\xmax is committed and Snapshot

Re: Adding REPACK [concurrently]

2025-08-25 Thread Mihail Nikalayeu
Hi, Antonin > How does HeapTupleSatisfiesSelf() recognize the status of any XID w/o using a > snapshot? Do you mean by checking the commit log (TransactionIdDidCommit) ? Yes, TransactionIdDidCommit. Another option is just invent a new snapshot type - SnapshotBelieveEverythingCommitted - for that

Re: Adding REPACK [concurrently]

2025-08-25 Thread Robert Treat
On Sat, Aug 23, 2025 at 10:23 AM Álvaro Herrera wrote: > On 2025-08-23, Michael Banck wrote: > > On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote: > > >> I don't think we need to keep vacuumdb. Packagers can keep a symlink > >> (vacuumdb) > >> to pg_repackdb. We can add a similar war

Re: Adding REPACK [concurrently]

2025-08-25 Thread Antonin Houska
Mihail Nikalayeu wrote: > > Not sure I understand in all details, but I don't think SnapshotSelf is the > > correct snapshot. Note that HeapTupleSatisfiesSelf() does not use its > > 'snapshot' argument at all. Instead, it considers the set of running > > transactions as it is at the time the func

Re: Adding REPACK [concurrently]

2025-08-25 Thread Mihail Nikalayeu
Hi, Antonin! > I assume you are concerned with the patch part 0005 of the v12 patch > ("Preserve visibility information of the concurrent data changes."), aren't > you? Yes, of course. I got an idea while trying to find a way to optimize it. > Not sure I understand in all details, but I don't th

Re: Adding REPACK [concurrently]

2025-08-25 Thread Antonin Houska
Mihail Nikalayeu wrote: > I was looking into catalog-related logical decoding features, and it > seems like they are clearly overkill for the repack case. > We don't need CID tracking or even a snapshot for each commit if we’re > okay with passing xmin/xmax as arguments. I assume you are concern

Re: Adding REPACK [concurrently]

2025-08-24 Thread Mihail Nikalayeu
Hello, Antonin! > When posting version 12 of the patch [1] I raised a concern that the the MVCC > safety is too expensive when it comes to logical decoding. Therefore, I > abandoned the concept for now, and v13 [2] uses plain heap_insert(). Once we > implement the MVCC safety, we simply rewrite th

Re: Adding REPACK [concurrently]

2025-08-23 Thread Álvaro Herrera
On 2025-08-23, Michael Banck wrote: > On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote: >> I don't think we need to keep vacuumdb. Packagers can keep a symlink >> (vacuumdb) >> to pg_repackdb. We can add a similar warning message saying they should use >> pg_repackdb if the symlink

Re: Adding REPACK [concurrently]

2025-08-22 Thread Michael Banck
Hi, On Fri, Aug 22, 2025 at 05:32:34PM -0300, Euler Taveira wrote: > On Fri, Aug 22, 2025, at 6:40 AM, Álvaro Herrera wrote: > > On 2025-Aug-21, Robert Treat wrote: > >> What's the plan for clusterdb? It seems like we'd ideally create a > >> stand alone pg_repackdb which replaces clusterdb and als

Re: Adding REPACK [concurrently]

2025-08-22 Thread Euler Taveira
On Fri, Aug 22, 2025, at 6:40 AM, Álvaro Herrera wrote: > On 2025-Aug-21, Robert Treat wrote: > >> What's the plan for clusterdb? It seems like we'd ideally create a >> stand alone pg_repackdb which replaces clusterdb and also allows us to >> remove the FULL options from vacuumdb. > > I don't think

Re: Adding REPACK [concurrently]

2025-08-22 Thread Álvaro Herrera
On 2025-Aug-21, Robert Treat wrote: > What's the plan for clusterdb? It seems like we'd ideally create a > stand alone pg_repackdb which replaces clusterdb and also allows us to > remove the FULL options from vacuumdb. I don't think we should remove clusterdb, to avoid breaking any scripts that w

Re: Adding REPACK [concurrently]

2025-08-21 Thread Robert Treat
On Tue, Aug 19, 2025 at 2:53 PM Álvaro Herrera wrote: > Note choice of shell command name: though all the other programs in > src/bin/scripts do not use the "pg_" prefix, this one does; we thought > it made no sense to follow the old programs as precedent because there > seems to be a lament for t

Re: Adding REPACK [concurrently]

2025-08-21 Thread Antonin Houska
Andres Freund wrote: > Hi, > > On 2025-08-20 16:22:41 +0200, Antonin Houska wrote: > > Álvaro Herrera wrote: > > > > > On 2025-Aug-20, Antonin Houska wrote: > > > > > > > There's an issue with the symlink, maybe some meson expert can help. In > > > > particular, the CI on Windows ends up with t

Re: Adding REPACK [concurrently]

2025-08-21 Thread Andres Freund
Hi, On 2025-08-21 20:14:14 +0200, Antonin Houska wrote: > ok, installing a copy of the same executable with a different name seems more > reliable. At least that's how the postmaster->postgres link used to be > handled, if I read Makefile correctly. Thanks. I have not followed this thread, but I

Re: Adding REPACK [concurrently]

2025-08-21 Thread Antonin Houska
Mihail Nikalayeu wrote: > Also, I think I found an issue (or lost something during rebase): we > must preserve xmin,cmin during initial copy > to make sure that data is going to be visible by snapshots of > concurrent changes later: > > static void > reform_and_rewrite_tuple(..) > . >

Re: Adding REPACK [concurrently]

2025-08-20 Thread Mihail Nikalayeu
Hello everyone! Alvaro Herrera : > Please note that Antonin already implemented this. See his patches > here: > https://www.postgresql.org/message-id/77690.1725610115%40antos > I proposed to leave this part out initially, which is why it hasn't been > reposted. We can review and discuss after th

Re: Adding REPACK [concurrently]

2025-08-20 Thread Andres Freund
Hi, On 2025-08-20 16:22:41 +0200, Antonin Houska wrote: > Álvaro Herrera wrote: > > > On 2025-Aug-20, Antonin Houska wrote: > > > > > There's an issue with the symlink, maybe some meson expert can help. In > > > particular, the CI on Windows ends up with the following error: > > > > > > ERROR: Tr

Re: Adding REPACK [concurrently]

2025-08-20 Thread Antonin Houska
Álvaro Herrera wrote: > On 2025-Aug-20, Antonin Houska wrote: > > > There's an issue with the symlink, maybe some meson expert can help. In > > particular, the CI on Windows ends up with the following error: > > > > ERROR: Tried to install symlink to missing file > > C:/cirrus/build/tmp_instal

Re: Adding REPACK [concurrently]

2025-08-20 Thread Álvaro Herrera
On 2025-Aug-20, Antonin Houska wrote: > There's an issue with the symlink, maybe some meson expert can help. In > particular, the CI on Windows ends up with the following error: > > ERROR: Tried to install symlink to missing file > C:/cirrus/build/tmp_install/usr/local/pgsql/bin/vacuumdb Hmm, t

Re: Adding REPACK [concurrently]

2025-08-20 Thread Antonin Houska
Álvaro Herrera wrote: > Still on pg_repackdb, the implementation here is to install a symlink > called pg_repackdb which points to vacuumdb, and make the program behave > differently when called in this way. The amount of additional code for > this is relatively small, so I think this is a worth

Re: Adding REPACK [concurrently]

2025-08-20 Thread Antonin Houska
Alvaro Herrera wrote: > On 2025-Aug-16, Robert Treat wrote: > > > On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska wrote: > > > > Now that we want to cover the CLUSTER/VACUUM FULL completely, I've > > > checked the > > > options of VACUUM FULL. I found two items not supported by REPACK (but > >

Re: Adding REPACK [concurrently]

2025-08-19 Thread Alvaro Herrera
On 2025-Aug-16, Robert Treat wrote: > In the v18 patch, the docs say that repack doesn't remember the index, > but it seems we are still calling mark_index_clustered, so I think the > above is true but we need to update the docs(?). Yes, the docs are obsolete on this point, I'm in the process of

Re: Adding REPACK [concurrently]

2025-08-19 Thread Alvaro Herrera
On 2025-Aug-16, Robert Treat wrote: > On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska wrote: > > Now that we want to cover the CLUSTER/VACUUM FULL completely, I've checked > > the > > options of VACUUM FULL. I found two items not supported by REPACK (but also > > not supported by by CLUSTER): ANA

Re: Adding REPACK [concurrently]

2025-08-16 Thread Robert Treat
On Tue, Aug 5, 2025 at 4:59 AM Antonin Houska wrote: > > Alvaro Herrera wrote: > > > I made a few changes from Antonin's original at [2]. First, I modified > > the grammar to support "REPACK [tab] USING INDEX" without specifying the > > index name. With this change, all possibilities of the old

Re: Adding REPACK [concurrently]

2025-08-15 Thread Alvaro Herrera
On 2025-Aug-15, Antonin Houska wrote: > This is v18 again. Thanks for this! > Parts 0001 through 0004 are unchanged, however 0005 is added. It > implements a new client application pg_repackdb. (If I posted 0005 > alone its regression tests would not work. I wonder if the cfbot > handles the rep

Re: Adding REPACK [concurrently]

2025-08-09 Thread Alvaro Herrera
On 2025-Aug-09, Mihail Nikalayeu wrote: > Hello! > > One more thing - I think build_new_indexes and > index_concurrently_create_copy are very close in semantics, so it > might be a good idea to refactor them a bit. > > I’m still concerned about MVCC-related issues. For multiple > applications, t

Re: Adding REPACK [concurrently]

2025-08-09 Thread Mihail Nikalayeu
Hello! One more thing - I think build_new_indexes and index_concurrently_create_copy are very close in semantics, so it might be a good idea to refactor them a bit. I’m still concerned about MVCC-related issues. For multiple applications, this is a dealbreaker, because in some cases correctness i

Re: Adding REPACK [concurrently]

2025-08-05 Thread Antonin Houska
Alvaro Herrera wrote: > I made a few changes from Antonin's original at [2]. First, I modified > the grammar to support "REPACK [tab] USING INDEX" without specifying the > index name. With this change, all possibilities of the old commands are > covered, ... > Here's a list of existing comman

Re: Adding REPACK [concurrently]

2025-08-04 Thread Mihail Nikalayeu
Hello Álvaro, Should we skip non-ready indexes in build_new_indexes? Also, in the current implementation, concurrent mode is marked as non-MVCC safe. From my point of view, this is a significant limitation for practical use. Should we consider an option to exchange non-MVCC issues to short exclus

Re: Adding REPACK [concurrently]

2025-08-01 Thread Fujii Masao
On Fri, Aug 1, 2025 at 1:50 AM Alvaro Herrera wrote: > One of the later patches in the series, which I have not included yet, > intends to implement the idea of transiently enabling wal_level=logical > for the table being repacked concurrently, so that you can still use > the concurrent mode if yo

Re: Adding REPACK [concurrently]

2025-07-26 Thread Fujii Masao
On Sun, Jul 27, 2025 at 6:56 AM Alvaro Herrera wrote: > > Hello, > > Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it. > This is coming from [1]. The ultimate goal is to have an in-core tool > to allow concurrent table rewrite to get rid of bloat; +1 > right now, VACUUM

Re: Adding REPACK [concurrently]

2025-07-26 Thread Robert Treat
On Sat, Jul 26, 2025 at 5:56 PM Alvaro Herrera wrote: > > Hello, > > Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it. > This is coming from [1]. The ultimate goal is to have an in-core tool > to allow concurrent table rewrite to get rid of bloat; right now, VACUUM > FULL d

Adding REPACK [concurrently]

2025-07-26 Thread Alvaro Herrera
Hello, Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it. This is coming from [1]. The ultimate goal is to have an in-core tool to allow concurrent table rewrite to get rid of bloat; right now, VACUUM FULL does that, but it's not concurrent. Users have resorted to using the