Re: decoupling table and index vacuum

2022-02-10 Thread Peter Geoghegan
On Thu, Feb 10, 2022 at 11:14 AM Robert Haas wrote: > Hmm. I think you're vastly overestimating the extent to which it's > possible to spread out and reschedule the work. I don't know which of > us is wrong. From my point of view, if VACUUM is going to do a full > phase 1 heap pass and a full

Re: decoupling table and index vacuum

2022-02-10 Thread Peter Geoghegan
On Thu, Feb 10, 2022 at 11:16 AM Robert Haas wrote: > On Thu, Feb 10, 2022 at 3:10 AM Dilip Kumar wrote: > > Actually I was not worried about the scan getting slow. What I was > > worried about is if we keep ignoring the dead tuples for long time > > then in the worst case if we have huge

Re: decoupling table and index vacuum

2022-02-10 Thread Robert Haas
On Thu, Feb 10, 2022 at 3:10 AM Dilip Kumar wrote: > Actually I was not worried about the scan getting slow. What I was > worried about is if we keep ignoring the dead tuples for long time > then in the worst case if we have huge number of dead tuples in the > index maybe 80% to 90% and then

Re: decoupling table and index vacuum

2022-02-10 Thread Robert Haas
On Wed, Feb 9, 2022 at 6:18 PM Peter Geoghegan wrote: > You seem to be vastly underestimating the value in being able to > spread out and reschedule the work, and manage costs more generally. Hmm. I think you're vastly overestimating the extent to which it's possible to spread out and reschedule

Re: decoupling table and index vacuum

2022-02-10 Thread Dilip Kumar
On Wed, Feb 9, 2022 at 7:43 PM Robert Haas wrote: > > On Wed, Feb 9, 2022 at 1:18 AM Dilip Kumar wrote: > I think that dead index tuples really don't matter if they're going to > get removed anyway before a page split happens. In particular, if > we're going to do a bottom-up index deletion

Re: decoupling table and index vacuum

2022-02-09 Thread Peter Geoghegan
On Wed, Feb 9, 2022 at 1:41 PM Robert Haas wrote: > I'm not sure that we can. I mean, there's still only going to be ~3 > autovacuum workers, and there could be arbitrarily many tables. Even > if the vacuum load is within the bounds of what the system can > sustain, individual tables can't be

Re: decoupling table and index vacuum

2022-02-09 Thread Robert Haas
On Wed, Feb 9, 2022 at 2:27 PM Peter Geoghegan wrote: > We should probably dispense with the idea that we'll be making these > decisions about what to do with an index like this (bloated in a way > that bottom-up index deletion just won't help with) in an environment > that is similar to how the

Re: decoupling table and index vacuum

2022-02-09 Thread Peter Geoghegan
On Wed, Feb 9, 2022 at 6:13 AM Robert Haas wrote: > Just to be clear, when I say that the dead index tuples don't matter > here, I mean from the point of view of the index. From the point of > view of the table, the presence of dead index tuples (or even the > potential presence of dead tuples)

Re: decoupling table and index vacuum

2022-02-09 Thread Robert Haas
On Wed, Feb 9, 2022 at 1:18 AM Dilip Kumar wrote: > I agree with the point that we should be focusing more on index size > growth compared to dead tuples. But I don't think that we can > completely ignore the number of dead tuples. Although we have the > bottom-up index deletion but whether the

Re: decoupling table and index vacuum

2022-02-08 Thread Dilip Kumar
On Wed, Feb 9, 2022 at 1:21 AM Peter Geoghegan wrote: > > The btree side of this shouldn't care at all about dead tuples (in > general we focus way too much on dead tuples, and way too little on > pages). With bottom-up index deletion the number of dead tuples in the > index is just about

Re: decoupling table and index vacuum

2022-02-08 Thread Dilip Kumar
On Tue, Feb 8, 2022 at 10:42 PM Peter Geoghegan wrote: > > On Sun, Feb 6, 2022 at 11:25 PM Dilip Kumar wrote: > > > One thing we could try doing in order to make that easier would be: > > > tweak things so that when autovacuum vacuums the table, it only > > > vacuums the indexes if they meet

Re: decoupling table and index vacuum

2022-02-08 Thread Peter Geoghegan
On Tue, Feb 8, 2022 at 10:58 AM Robert Haas wrote: > Right, that's why I asked the question. If we're going to ask the > index AM whether it would like to be vacuumed right now, we're going > to have to put some logic into the index AM that knows how to answer > that question. But if we don't

Re: decoupling table and index vacuum

2022-02-08 Thread Robert Haas
On Tue, Feb 8, 2022 at 12:50 PM Peter Geoghegan wrote: > > It's not clear to me that we have enough information to make good > > decisions about which indexes to vacuum and which indexes to skip. > > What if "extra vacuuming, not skipping vacuuming" was not just an > abstract goal, but an actual

Re: decoupling table and index vacuum

2022-02-08 Thread Peter Geoghegan
On Tue, Feb 8, 2022 at 9:33 AM Robert Haas wrote: > On Tue, Feb 8, 2022 at 12:12 PM Peter Geoghegan wrote: > > I believe that the main benefit of the dead TID conveyor belt (outside > > of global index use cases) will be to enable us to do more (much more) > > index vacuuming for one index in

Re: decoupling table and index vacuum

2022-02-08 Thread Robert Haas
On Tue, Feb 8, 2022 at 12:12 PM Peter Geoghegan wrote: > I believe that the main benefit of the dead TID conveyor belt (outside > of global index use cases) will be to enable us to do more (much more) > index vacuuming for one index in particular. So it's not really about > doing less index

Re: decoupling table and index vacuum

2022-02-08 Thread Peter Geoghegan
On Sun, Feb 6, 2022 at 11:25 PM Dilip Kumar wrote: > > One thing we could try doing in order to make that easier would be: > > tweak things so that when autovacuum vacuums the table, it only > > vacuums the indexes if they meet some threshold for bloat. I'm not > > sure exactly what happens with

Re: decoupling table and index vacuum

2022-02-06 Thread Dilip Kumar
On Fri, Feb 4, 2022 at 11:45 PM Robert Haas wrote: > > On Wed, Jan 26, 2022 at 8:58 AM Dilip Kumar wrote: > > TODO: > > - This is just a POC patch to discuss the design idea and needs a lot > > of improvement and testing. > > - We are using a slightly different format for storing the dead tids >

Re: decoupling table and index vacuum

2022-02-04 Thread Peter Geoghegan
On Fri, Feb 4, 2022 at 1:54 PM Robert Haas wrote: > > The constantly modified index will be entirely dependent on index > > vacuuming here, and so an improved VACUUM design that allows that > > particular index to be vacuumed more frequently could really improve > > performance. > > Thanks for

Re: decoupling table and index vacuum

2022-02-04 Thread Robert Haas
On Fri, Feb 4, 2022 at 1:46 PM Peter Geoghegan wrote: > That should work. All you need is a table with several indexes, and a > workload consisting of updates that modify a column that is the key > column for only one of the indexes. I would expect bottom-up index > deletion to be 100% effective

Re: decoupling table and index vacuum

2022-02-04 Thread Peter Geoghegan
On Fri, Feb 4, 2022 at 1:15 PM Robert Haas wrote: > My second thought was that perhaps we can create a test scenario > where, in one index, the deduplication and bottom-up index deletion > and kill_prior_tuple mechanisms are very effective, and in another > index, it's not effective at all. For

Re: decoupling table and index vacuum

2022-02-04 Thread Robert Haas
On Wed, Jan 26, 2022 at 8:58 AM Dilip Kumar wrote: > TODO: > - This is just a POC patch to discuss the design idea and needs a lot > of improvement and testing. > - We are using a slightly different format for storing the dead tids > into the conveyor belt which is explained in the patch but the

Re: decoupling table and index vacuum

2022-01-26 Thread Dilip Kumar
On Mon, Sep 27, 2021 at 8:48 AM Masahiko Sawada wrote: > Hi, Here is the first WIP patch for the decoupling table and index vacuum. The first mail of the thread has already explained the complete background of why we want to do this so instead of describing that I will directly j

Re: decoupling table and index vacuum

2021-09-26 Thread Masahiko Sawada
On Sat, Sep 25, 2021 at 10:17 AM Peter Geoghegan wrote: > > On Thu, Sep 23, 2021 at 10:42 PM Masahiko Sawada > wrote: > > On Thu, Sep 16, 2021 at 7:09 AM Peter Geoghegan wrote: > > > Enabling index-only scans is a good enough reason to pursue this > > > project, even on its own. > > > > +1 > >

Re: decoupling table and index vacuum

2021-09-24 Thread Peter Geoghegan
On Fri, Sep 24, 2021 at 7:44 PM Peter Geoghegan wrote: > The scheduling of autovacuum is itself a big problem for the two big > BenchmarkSQL tables I'm always going on about -- though it did get a > lot better with the introduction of the > autovacuum_vacuum_insert_scale_factor stuff in Postgres

Re: decoupling table and index vacuum

2021-09-24 Thread Peter Geoghegan
On Fri, Sep 24, 2021 at 11:48 AM Robert Haas wrote: > Actually, I have. I've been focusing on trying to create a general > infrastructure for conveyor belt storage. An incomplete and likely > quite buggy version of this can be found here: > >

Re: decoupling table and index vacuum

2021-09-24 Thread Peter Geoghegan
On Thu, Sep 23, 2021 at 10:42 PM Masahiko Sawada wrote: > On Thu, Sep 16, 2021 at 7:09 AM Peter Geoghegan wrote: > > Enabling index-only scans is a good enough reason to pursue this > > project, even on its own. > > +1 I was hoping that you might be able to work on opportunistically freezing

Re: decoupling table and index vacuum

2021-09-24 Thread Robert Haas
On Wed, Sep 15, 2021 at 6:08 PM Peter Geoghegan wrote: > Have you started any work on this project? I think that it's a very good idea. Actually, I have. I've been focusing on trying to create a general infrastructure for conveyor belt storage. An incomplete and likely quite buggy version of

Re: decoupling table and index vacuum

2021-09-23 Thread Masahiko Sawada
On Thu, Sep 16, 2021 at 7:09 AM Peter Geoghegan wrote: > > On Wed, Apr 21, 2021 at 8:21 AM Robert Haas wrote: > > Now, the reason for this is that when we discover dead TIDs, we only > > record them in memory, not on disk. So, as soon as VACUUM ends, we > > lose all knowledge of whether those

Re: decoupling table and index vacuum

2021-09-15 Thread Peter Geoghegan
On Wed, Apr 21, 2021 at 8:21 AM Robert Haas wrote: > Now, the reason for this is that when we discover dead TIDs, we only > record them in memory, not on disk. So, as soon as VACUUM ends, we > lose all knowledge of whether those TIDs were and must rediscover > them. Suppose we didn't do this, and

Re: decoupling table and index vacuum

2021-06-23 Thread Antonin Houska
Andres Freund wrote: > On 2021-04-21 11:21:31 -0400, Robert Haas wrote: > > This scheme adds a lot of complexity, which is a concern, but it seems > > to me that it might have several benefits. One is concurrency. You > > could have one process gathering dead TIDs and adding them to the > >

Re: decoupling table and index vacuum

2021-05-06 Thread Dilip Kumar
On Thu, 6 May 2021 at 4:12 PM, Masahiko Sawada wrote: > On Thu, May 6, 2021 at 7:19 PM Robert Haas wrote: > > > > On Thu, May 6, 2021 at 5:02 AM Masahiko Sawada > wrote: > > > Not sure we will need to hold buffer locks for both the TID fork and > > > the heap at the same time but I agree that

Re: decoupling table and index vacuum

2021-05-06 Thread Masahiko Sawada
On Thu, May 6, 2021 at 7:19 PM Robert Haas wrote: > > On Thu, May 6, 2021 at 5:02 AM Masahiko Sawada wrote: > > Not sure we will need to hold buffer locks for both the TID fork and > > the heap at the same time but I agree that we could need to lock on > > multiple TID fork buffers. We could

Re: decoupling table and index vacuum

2021-05-06 Thread Robert Haas
On Thu, May 6, 2021 at 5:02 AM Masahiko Sawada wrote: > Not sure we will need to hold buffer locks for both the TID fork and > the heap at the same time but I agree that we could need to lock on > multiple TID fork buffers. We could need to add dead TIDs to up to two > pages for the TID fork

Re: decoupling table and index vacuum

2021-05-06 Thread Masahiko Sawada
On Thu, May 6, 2021 at 3:38 PM Dilip Kumar wrote: > > On Thu, May 6, 2021 at 8:27 AM Masahiko Sawada wrote: > > > > > I'm doubtful about skipping WAL logging entirely - I'd have to think > > > harder about it, but I think that'd mean we'd restart from scratch after > > > crashes / immediate

Re: decoupling table and index vacuum

2021-05-06 Thread Dilip Kumar
On Thu, May 6, 2021 at 8:27 AM Masahiko Sawada wrote: > > > I'm doubtful about skipping WAL logging entirely - I'd have to think > > harder about it, but I think that'd mean we'd restart from scratch after > > crashes / immediate restarts as well, because we couldn't rely on the > > contents of

Re: decoupling table and index vacuum

2021-05-05 Thread Masahiko Sawada
On Fri, Apr 23, 2021 at 5:01 AM Andres Freund wrote: > > Hi, > > On 2021-04-22 12:15:27 -0400, Robert Haas wrote: > > On Wed, Apr 21, 2021 at 5:38 PM Andres Freund wrote: > > > I'm not sure that's the only way to deal with this. While some form of > > > generic "conveyor belt" infrastructure

Re: decoupling table and index vacuum

2021-04-25 Thread Masahiko Sawada
On Sat, Apr 24, 2021 at 12:22 AM Robert Haas wrote: > > On Fri, Apr 23, 2021 at 7:04 AM Masahiko Sawada wrote: > > I think we can divide the TID fork into 16MB or 32MB chunks like WAL > > segment files so that we can easily remove old chunks. Regarding the > > efficient search part, I think we

Re: decoupling table and index vacuum

2021-04-24 Thread Peter Geoghegan
On Sat, Apr 24, 2021 at 1:17 PM Peter Geoghegan wrote: > On Sat, Apr 24, 2021 at 12:56 PM Andres Freund wrote: > > Imo the question isn't really whether criteria will ever do something > > wrong, but how often and how consequential such mistakes will > > be. E.g. unnecessarily vacuuming an index

Re: decoupling table and index vacuum

2021-04-24 Thread Peter Geoghegan
On Sat, Apr 24, 2021 at 12:56 PM Andres Freund wrote: > Did anybody actually argue for using #live entries directly? I think > *dead* entries is more relevant, partiuclarly because various forms of > local cleanup can be taken into account. Live tuples might come in to > put the number of dead

Re: decoupling table and index vacuum

2021-04-24 Thread Andres Freund
Hi, On 2021-04-24 11:59:29 -0700, Peter Geoghegan wrote: > The number of live tuples (or even dead tuples) in the whole entire > index is simply not a useful proxy for what actually matters -- this > is 100% clear. Did anybody actually argue for using #live entries directly? I think *dead*

Re: decoupling table and index vacuum

2021-04-24 Thread Peter Geoghegan
On Sat, Apr 24, 2021 at 11:43 AM Andres Freund wrote: > I don't see how that's good enough as a general approach. It won't work > on indexes that insert on one end, delete from the other (think > inserted_at or serial primary keys in many workloads). That can be treated as another extreme that

Re: decoupling table and index vacuum

2021-04-24 Thread Andres Freund
Hi, On 2021-04-24 11:21:49 -0700, Peter Geoghegan wrote: > To expand on this a bit, my objection to counting the number of live > tuples in the index (as a means to determining how aggressively each > individual index needs to be vacuumed) is this: it's driven by > positive feedback, not negative

Re: decoupling table and index vacuum

2021-04-24 Thread Peter Geoghegan
On Fri, Apr 23, 2021 at 1:04 PM Peter Geoghegan wrote: > I think that a simple heuristic could work very well here, but it > needs to be at least a little sensitive to the extremes. And I mean > all of the extremes, not just the one from my example -- every > variation exists and will cause

Re: decoupling table and index vacuum

2021-04-23 Thread Peter Geoghegan
On Thu, Apr 22, 2021 at 1:01 PM Andres Freund wrote: > The gin case seems a bit easier than the partial index case. Keeping > stats about the number of new entries in a GIN index doesn't seem too > hard, nor does tracking the number of cleaned up index entries. But > knowing which indexes are

Re: decoupling table and index vacuum

2021-04-23 Thread Peter Geoghegan
On Fri, Apr 23, 2021 at 8:44 AM Robert Haas wrote: > On Thu, Apr 22, 2021 at 4:52 PM Peter Geoghegan wrote: > > Mostly what I'm saying is that I would like to put together a rough > > list of things that we could do to improve VACUUM along the lines > > we've discussed -- all of which stem from

Re: decoupling table and index vacuum

2021-04-23 Thread Robert Haas
On Thu, Apr 22, 2021 at 4:52 PM Peter Geoghegan wrote: > Mostly what I'm saying is that I would like to put together a rough > list of things that we could do to improve VACUUM along the lines > we've discussed -- all of which stem from $SUBJECT. There are > literally dozens of goals (some of

Re: decoupling table and index vacuum

2021-04-23 Thread Robert Haas
On Fri, Apr 23, 2021 at 7:04 AM Masahiko Sawada wrote: > I think we can divide the TID fork into 16MB or 32MB chunks like WAL > segment files so that we can easily remove old chunks. Regarding the > efficient search part, I think we need to consider the case where the > TID fork gets bigger than

Re: decoupling table and index vacuum

2021-04-23 Thread Masahiko Sawada
On Fri, Apr 23, 2021 at 3:47 AM Robert Haas wrote: > > On Thu, Apr 22, 2021 at 10:28 AM Masahiko Sawada > wrote: > > The dead TID fork needs to also be efficiently searched. If the heap > > scan runs twice, the collected dead TIDs on each heap pass could be > > overlapped. But we would not be

Getting rid of freezing and hint bits by eagerly vacuuming aborted xacts (was: decoupling table and index vacuum)

2021-04-22 Thread Peter Geoghegan
On Thu, Apr 22, 2021 at 3:52 PM Peter Geoghegan wrote: > On Thu, Apr 22, 2021 at 11:16 AM Robert Haas wrote: > > > My most ambitious goal is finding a way to remove the need to freeze > > > or to set hint bits. I think that we can do this by inventing a new > > > kind of VACUUM just for aborted

Re: decoupling table and index vacuum

2021-04-22 Thread Peter Geoghegan
On Thu, Apr 22, 2021 at 11:16 AM Robert Haas wrote: > > My most ambitious goal is finding a way to remove the need to freeze > > or to set hint bits. I think that we can do this by inventing a new > > kind of VACUUM just for aborted transactions, which doesn't do index > > vacuuming. You'd need

Re: decoupling table and index vacuum

2021-04-22 Thread Peter Geoghegan
On Thu, Apr 22, 2021 at 12:27 PM Robert Haas wrote: > I agree strongly with this. In fact, I seem to remember saying similar > things to you in the past. If something wins $1 in 90% of cases and > loses $5 in 10% of cases, is it a good idea? Well, it depends on how > the losses are distributed.

Re: decoupling table and index vacuum

2021-04-22 Thread Andres Freund
Hi, On 2021-04-22 12:15:27 -0400, Robert Haas wrote: > On Wed, Apr 21, 2021 at 5:38 PM Andres Freund wrote: > > I'm not sure that's the only way to deal with this. While some form of > > generic "conveyor belt" infrastructure would be a useful building block, > > and it'd be sensible to use it

Re: decoupling table and index vacuum

2021-04-22 Thread Robert Haas
On Thu, Apr 22, 2021 at 3:11 PM Peter Geoghegan wrote: > I think that everybody's beliefs about VACUUM tend to be correct. It > almost doesn't matter if scenario A is the problem in 90% or cases > versus 10% of cases for scenario B (or vice-versa). What actually > matters is that we have good

Re: decoupling table and index vacuum

2021-04-22 Thread Peter Geoghegan
On Thu, Apr 22, 2021 at 11:44 AM Andres Freund wrote: > I'm honestly getting a bit annoyed about this stuff. You're easily annoyed. > Yes it's a cool > improvement, but no, it doesn't mean that there aren't still relevant > issues in important cases. It doesn't help that you repeatedly imply >

Re: decoupling table and index vacuum

2021-04-22 Thread Andres Freund
Hi, On 2021-04-22 14:47:14 -0400, Robert Haas wrote: > On Thu, Apr 22, 2021 at 10:28 AM Masahiko Sawada > wrote: > > Right. Given decoupling index vacuuming, I think the index’s garbage > > statistics are important which preferably need to be fetchable without > > accessing indexes. It would be

Re: decoupling table and index vacuum

2021-04-22 Thread Robert Haas
On Thu, Apr 22, 2021 at 10:28 AM Masahiko Sawada wrote: > The dead TID fork needs to also be efficiently searched. If the heap > scan runs twice, the collected dead TIDs on each heap pass could be > overlapped. But we would not be able to merge them if we did index > vacuuming on one of indexes

Re: decoupling table and index vacuum

2021-04-22 Thread Andres Freund
Hi, On 2021-04-22 11:30:21 -0700, Peter Geoghegan wrote: > I think that you're both missing very important subtleties here. > Apparently the "quantitative vs qualitative" distinction I like to > make hasn't cleared it up. I'm honestly getting a bit annoyed about this stuff. Yes it's a cool

Re: decoupling table and index vacuum

2021-04-22 Thread Robert Haas
On Thu, Apr 22, 2021 at 7:51 AM Dilip Kumar wrote: > How do we decide this target, I mean at a given point how do we decide > that what is the limit of dead TID's at which we want to trigger the > index vacuuming? It's tricky. Essentially, it's a cost-benefit analysis. On the "cost" side, the

Re: decoupling table and index vacuum

2021-04-22 Thread Peter Geoghegan
On Thu, Apr 22, 2021 at 9:15 AM Robert Haas wrote: > > Have you thought about how we would do the scheduling of vacuums for the > > different indexes? We don't really have useful stats for the number of > > dead index entries to be expected in an index. It'd not be hard to track > > how many

Re: decoupling table and index vacuum

2021-04-22 Thread Robert Haas
On Wed, Apr 21, 2021 at 7:51 PM Peter Geoghegan wrote: > I'm very happy to see that you've taken an interest in this work! I > believe it's an important area. It's too important to be left to only > two contributors. I welcome your participation as an equal partner in > the broader project to fix

Re: decoupling table and index vacuum

2021-04-22 Thread Robert Haas
On Wed, Apr 21, 2021 at 5:38 PM Andres Freund wrote: > I'm not sure that's the only way to deal with this. While some form of > generic "conveyor belt" infrastructure would be a useful building block, > and it'd be sensible to use it here if it existed, it seems feasible to > dead tids in a

Re: decoupling table and index vacuum

2021-04-22 Thread Masahiko Sawada
On Thu, Apr 22, 2021 at 8:51 AM Peter Geoghegan wrote: > > On Wed, Apr 21, 2021 at 8:21 AM Robert Haas wrote: > > We are used to thinking about table vacuum and index vacuum as parts > > of a single, indivisible operation. You vacuum the table -- among > > other things by performing HOT pruning

Re: decoupling table and index vacuum

2021-04-22 Thread Dilip Kumar
On Wed, Apr 21, 2021 at 8:51 PM Robert Haas wrote: > Now, the reason for this is that when we discover dead TIDs, we only > record them in memory, not on disk. So, as soon as VACUUM ends, we > lose all knowledge of whether those TIDs were and must rediscover > them. Suppose we didn't do this,

Re: decoupling table and index vacuum

2021-04-21 Thread Peter Geoghegan
On Wed, Apr 21, 2021 at 8:21 AM Robert Haas wrote: > We are used to thinking about table vacuum and index vacuum as parts > of a single, indivisible operation. You vacuum the table -- among > other things by performing HOT pruning and remembering dead TIDs -- > and then you vacuum the indexes --

Re: decoupling table and index vacuum

2021-04-21 Thread Andres Freund
Hi, On 2021-04-21 11:21:31 -0400, Robert Haas wrote: > Opportunistic index cleanup strategies like > kill_prior_tuple and bottom-up deletion may work much better for some > indexes than others, meaning that you could have some indexes that > badly need to be vacuumed because they are full of

decoupling table and index vacuum

2021-04-21 Thread Robert Haas
Hi, We are used to thinking about table vacuum and index vacuum as parts of a single, indivisible operation. You vacuum the table -- among other things by performing HOT pruning and remembering dead TIDs -- and then you vacuum the indexes -- removing the remembered TIDs from the index -- and then