Re: index prefetching

2025-07-19 Thread Thomas Munro
On Sat, Jul 19, 2025 at 11:23 PM Tomas Vondra wrote: > Thanks for the link. It seems I came up with an almost the same patch, > with three minor differences: > > 1) There's another place that sets "distance = 0" in > read_stream_next_buffer, so maybe this should preserve the distance too? > > 2) I

Re: index prefetching

2025-07-19 Thread Tomas Vondra
On 7/19/25 06:03, Thomas Munro wrote: > On Sat, Jul 19, 2025 at 6:31 AM Tomas Vondra wrote: >> Perhaps the ReadStream should do something like this? Of course, the >> simple patch resets the stream very often, likely mcuh more often than >> anything else in the code. But wouldn't it be beneficial

Re: index prefetching

2025-07-18 Thread Thomas Munro
On Sat, Jul 19, 2025 at 6:31 AM Tomas Vondra wrote: > Perhaps the ReadStream should do something like this? Of course, the > simple patch resets the stream very often, likely mcuh more often than > anything else in the code. But wouldn't it be beneficial for streams > reset because of a rescan? Po

Re: index prefetching

2025-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2025 at 10:47 PM Andres Freund wrote: > > I think that the table AM probably needs to have its own definition of > > a batch (or some other distinct phrase/concept) -- it's not > > necessarily the same group of TIDs that are associated with a batch on > > the index AM side. > > I a

Re: index prefetching

2025-07-18 Thread Andres Freund
Hi, On 2025-07-18 17:44:26 -0400, Peter Geoghegan wrote: > On Fri, Jul 18, 2025 at 4:52 PM Andres Freund wrote: > > I don't agree with that. For efficiency reasons alone table AMs should get a > > whole batch of TIDs at once. If you have an ordered indexscan that returns > > TIDs that are correla

Re: index prefetching

2025-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2025 at 4:52 PM Andres Freund wrote: > I don't agree with that. For efficiency reasons alone table AMs should get a > whole batch of TIDs at once. If you have an ordered indexscan that returns > TIDs that are correlated with the table, we waste *tremendous* amount of > cycles right

Re: index prefetching

2025-07-18 Thread Andres Freund
Hi, On 2025-07-18 19:44:51 +0200, Tomas Vondra wrote: > I agree tableam needs to have a say in this, so that it can interpret > the TIDs in a way that fits how it actually stores data. But I'm not > sure it should be responsible for calling index_batch_getnext(). Isn't > the batching mostly an "im

Re: index prefetching

2025-07-18 Thread Peter Geoghegan
On Fri, Jul 18, 2025 at 1:44 PM Tomas Vondra wrote: > I agree tableam needs to have a say in this, so that it can interpret > the TIDs in a way that fits how it actually stores data. But I'm not > sure it should be responsible for calling index_batch_getnext(). Isn't > the batching mostly an "impl

Re: index prefetching

2025-07-18 Thread Tomas Vondra
Hi, I was wondering why the "simple" approach performs so much worse than the "complex" one on some of the data sets. The theory was that it's due to using read_stream_reset(), which resets the prefetch distance, and so we need to "ramp up" from scratch (distance=1) for every batch. Which for the

Re: index prefetching

2025-07-18 Thread Tomas Vondra
On 7/17/25 00:33, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 6:18 PM Andres Freund wrote: >> There's no problem today - the indexams never use the tids to look up blocks >> themselves. They're always passed to the tableam to do so (via >> table_index_fetch_tuple() etc). I.e. the translation

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 6:18 PM Andres Freund wrote: > There's no problem today - the indexams never use the tids to look up blocks > themselves. They're always passed to the tableam to do so (via > table_index_fetch_tuple() etc). I.e. the translation from TIDs to specific > blocks & buffers happe

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 17:47:53 -0400, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 5:41 PM Andres Freund wrote: > > I don't mean the index tids, but how the read stream is fed block numbers. > > In > > the "complex" patch that's done by index_scan_stream_read_next(). And the > > block number it

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 5:41 PM Andres Freund wrote: > I don't mean the index tids, but how the read stream is fed block numbers. In > the "complex" patch that's done by index_scan_stream_read_next(). And the > block number it returns is simply > > return ItemPointerGetBlockNumber(tid); > >

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 17:27:23 -0400, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 4:46 PM Andres Freund wrote: > > Maybe I'm missing something, but the current interface doesn't seem to work > > for AMs that don't have a 1:1 mapping between the block number portion of > > the > > tid and the ac

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 4:46 PM Andres Freund wrote: > Maybe I'm missing something, but the current interface doesn't seem to work > for AMs that don't have a 1:1 mapping between the block number portion of the > tid and the actual block number? I'm not completely sure what you mean here. Even w

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 16:54:06 -0400, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 3:40 PM Andres Freund wrote: > > As a first thing I just wanted to get a feel for the improvements we can > > get. > > I had a scale 5 tpch already loaded, so I ran a bogus query on that to see. > > Cool. > > >

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 4:46 PM Andres Freund wrote: > Currently the API wouldn't easily allow the table AM to do batched TID lookups > - if you have a query that looks at a lot of table tuples in the same buffer > consecutively, we spend a lot of time locking/unlocking said buffer. We also > spe

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 19:56, Tomas Vondra wrote: > On 7/16/25 18:39, Peter Geoghegan wrote: >> On Wed, Jul 16, 2025 at 11:29 AM Peter Geoghegan wrote: >>> For example, with "linear_10 / eic=16 / sync", it looks like "complex" >>> has about half the latency of "simple" in tests where selectivity is >>> 10.

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 3:40 PM Andres Freund wrote: > As a first thing I just wanted to get a feel for the improvements we can get. > I had a scale 5 tpch already loaded, so I ran a bogus query on that to see. Cool. > Test: > > Peter's: To be clear, the "complex" patch is still almost all Toma

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 15:39:58 -0400, Andres Freund wrote: > Looking at the actual patches now. I just did an initial, not particularly in depth look. A few comments and questions below. For either patch, I think it's high time we split the index/table buffer stats in index scans. It's really ann

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 14:30:05 -0400, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 2:27 PM Andres Freund wrote: > > Could you share the current version of the complex patch (happy with a git > > tree)? Afaict it hasn't been posted, which makes this pretty hard follow > > along > > / provide feed

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 3:00 PM Tomas Vondra wrote: > Yes, sounds like a fair summary. Cool. > Perhaps, although I don't quite see why the simpler patch couldn't > address some of those problems (within the limit of a single leaf page, > of course). I don't think there's anything that's prevent

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 20:18, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 1:42 PM Tomas Vondra wrote: >> On 7/16/25 16:45, Peter Geoghegan wrote: >>> I get that index characteristics could be the limiting factor, >>> especially in a world where we're not yet eagerly reading leaf pages. >>> But that in

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 2:27 PM Andres Freund wrote: > Could you share the current version of the complex patch (happy with a git > tree)? Afaict it hasn't been posted, which makes this pretty hard follow along > / provide feedback on, for others. Sure: https://github.com/petergeoghegan/postgres

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 14:18:54 -0400, Peter Geoghegan wrote: > I don't fully understand why this appears to be less of a problem with > the complex patch. Can you help me to confirm my understanding? Could you share the current version of the complex patch (happy with a git tree)? Afaict it hasn't be

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 1:42 PM Tomas Vondra wrote: > On 7/16/25 16:45, Peter Geoghegan wrote: > > I get that index characteristics could be the limiting factor, > > especially in a world where we're not yet eagerly reading leaf pages. > > But that in no way justifies just forgetting about prefetc

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 18:39, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 11:29 AM Peter Geoghegan wrote: >> For example, with "linear_10 / eic=16 / sync", it looks like "complex" >> has about half the latency of "simple" in tests where selectivity is >> 10. The advantage for "complex" is even greater at

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 17:29, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 4:40 AM Tomas Vondra wrote: >> For "uniform" data set, both prefetch patches do much better than master >> (for low selectivities it's clearer in the log-scale chart). The >> "complex" prefetch patch appears to have a bit of an

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 16:45, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 10:37 AM Tomas Vondra wrote: >> What sounds weird? That the read_stream works like a stream of blocks, >> or that it can't do "pause" and we use "reset" as a workaround? > > The fact that prefetch distance is in any way affected b

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 11:29 AM Peter Geoghegan wrote: > For example, with "linear_10 / eic=16 / sync", it looks like "complex" > has about half the latency of "simple" in tests where selectivity is > 10. The advantage for "complex" is even greater at higher > "selectivity" values. All of the oth

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 4:40 AM Tomas Vondra wrote: > For "uniform" data set, both prefetch patches do much better than master > (for low selectivities it's clearer in the log-scale chart). The > "complex" prefetch patch appears to have a bit of an edge for >1% > selectivities. I find this a bit s

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 10:37 AM Tomas Vondra wrote: > What sounds weird? That the read_stream works like a stream of blocks, > or that it can't do "pause" and we use "reset" as a workaround? The fact that prefetch distance is in any way affected by a temporary inability to return more blocks. Ju

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 16:29, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 10:20 AM Tomas Vondra wrote: >> The read stream can only return blocks generated by the "next" callback. >> When we return the block for the last item on a leaf page, we can only >> return "InvalidBlockNumber" which means "no more

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 10:20 AM Tomas Vondra wrote: > The read stream can only return blocks generated by the "next" callback. > When we return the block for the last item on a leaf page, we can only > return "InvalidBlockNumber" which means "no more blocks in the stream". > And once we advance t

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 10:25 AM Andres Freund wrote: > This imo isn't something worth optimizing for - if you use an io_method that > actually can execute IO asynchronously this issue does not exist, as the start > of the IO will already have populated the buffer entry (without BM_VALID set, > of

Re: index prefetching

2025-07-16 Thread Andres Freund
Hi, On 2025-07-16 16:20:25 +0200, Tomas Vondra wrote: > On 7/16/25 16:07, Peter Geoghegan wrote: > >> Te pattern of fadvise+pread for the same block seems a bit silly. And > >> this is not just about "sync" method, the other methods will have a > >> similar issue with no starting the I/O earlier.

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 16:07, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 9:58 AM Tomas Vondra wrote: >>> The "simple" patch has _bt_readpage reset the read stream. That >>> doesn't make any sense to me. Though it does explain why the "complex" >>> patch does so many more fadvise calls. >>> >> >> Why it

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 9:58 AM Tomas Vondra wrote: > > The "simple" patch has _bt_readpage reset the read stream. That > > doesn't make any sense to me. Though it does explain why the "complex" > > patch does so many more fadvise calls. > > > > Why it doesn't make sense? The reset_stream_reset()

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 15:36, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 4:40 AM Tomas Vondra wrote: >> But the thing I don't really understand it the "cyclic" dataset (for >> example). And the "simple" patch performs really badly here. This data >> set is designed to not work for prefetching, it's p

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 9:36 AM Peter Geoghegan wrote: > Another issue with the "simple" patch: it adds 2 bool fields to > "BTScanPosItem". That increases its size considerably. We're very > sensitive to the size of this struct (I think that you know about this > already). Bloating it like this wi

Re: index prefetching

2025-07-16 Thread Peter Geoghegan
On Wed, Jul 16, 2025 at 4:40 AM Tomas Vondra wrote: > But the thing I don't really understand it the "cyclic" dataset (for > example). And the "simple" patch performs really badly here. This data > set is designed to not work for prefetching, it's pretty much an > adversary case. There's ~100 TIDs

Re: index prefetching

2025-07-15 Thread Peter Geoghegan
On Sat, Jul 12, 2025 at 7:50 PM Peter Geoghegan wrote: > Why wouldn't we want to relieve all AMs of that responsibility? > Leaving it up to index AMs has resulted in subtle bugs [2][3], and > AFAICT has no redeeming quality. If affected index AMs were *forced* > to do *exactly* the same thing as e

Re: index prefetching

2025-07-13 Thread Peter Geoghegan
On Sun, Jul 13, 2025 at 5:57 PM Tomas Vondra wrote: > Thank you! I'll take a look next week, but these numbers suggest you > simplified it a lot.. Right. I'm still not done removing code from nbtree here. I still haven't done things like generalize _bt_killitems across all index AMs. That can la

Re: index prefetching

2025-07-13 Thread Tomas Vondra
On 7/13/25 01:50, Peter Geoghegan wrote: > On Thu, May 1, 2025 at 7:02 PM Tomas Vondra wrote: >> There's two "fix" patches trying to make this work - it does not crash, >> and almost all the "incorrect" query results are actually stats about >> buffer hits etc. And that is expected to change wi

Re: index prefetching

2025-07-12 Thread Peter Geoghegan
On Thu, May 1, 2025 at 7:02 PM Tomas Vondra wrote: > There's two "fix" patches trying to make this work - it does not crash, > and almost all the "incorrect" query results are actually stats about > buffer hits etc. And that is expected to change with prefetching, not a > bug. But then there are a

Re: index prefetching

2025-07-09 Thread Tomas Vondra
Hi, I got pinged about issues (compiler warnings, and some test failures) in the simple patch version shared in May. So here's a rebased and cleaned up version addressing that, and a couple additional issues I ran into. FWIW if you run check-world on this, you may get failures in io_workers TAP t

Re: index prefetching

2025-04-22 Thread Peter Geoghegan
On Tue, Apr 22, 2025 at 2:34 PM Tomas Vondra wrote: > Yeah, that makes sense, although I've been thinking about this a bit > differently. I haven't been trying to establish a new "component" to > manage prefetching. For me the question was what's the right layer, so > that unnecessary details don'

Re: index prefetching

2025-04-22 Thread Tomas Vondra
er index AMs, considering how >> similar the design usually is to btree. >> >> This is one of the next things on my TODO. I want to be able to validate >> the design works for multiple AMs, not just btree. > > What's the most logical second index AM to support, after nbt

Re: index prefetching

2025-04-22 Thread Peter Geoghegan
> the design works for multiple AMs, not just btree. What's the most logical second index AM to support, after nbtree, then? Probably hash/hashgettuple? > I think this is a consequence of read_stream having an internal idea how > far ahead to prefetch, based on the number of re

Re: index prefetching

2025-04-04 Thread Tomas Vondra
On 4/2/25 18:05, Andres Freund wrote: > Hi, > > Since the patch has needed a rebase since mid February and is in Waiting on > Author since mid March, I think it'd be appropriate to mark this as Returned > with Feedback for now? Or at least moved to the next CF? > Yes, I agree. regards -- To

Re: index prefetching

2025-04-02 Thread Andres Freund
Hi, Since the patch has needed a rebase since mid February and is in Waiting on Author since mid March, I think it'd be appropriate to mark this as Returned with Feedback for now? Or at least moved to the next CF? Greetings, Andres Freund

Re: index prefetching

2024-11-11 Thread Peter Geoghegan
On Mon, Nov 11, 2024 at 2:00 PM Peter Geoghegan wrote: > The real sign that what I said is generally true of index AMs is that > you'll see so few calls to > LockBufferForCleanup/ConditionalLockBufferForCleanup. Only hash calls > ConditionalLockBufferForCleanup at all (which I find a bit weird). >

Re: index prefetching

2024-11-11 Thread Peter Geoghegan
On Mon, Nov 11, 2024 at 1:33 PM Robert Haas wrote: > That makes sense from the point of view of working with the btree code > itself, but from a system-wide perspective, it's weird to pretend like > the pins don't exist or don't matter just because a buffer lock is > also held. I can see how that

Re: index prefetching

2024-11-11 Thread Robert Haas
On Mon, Nov 11, 2024 at 1:03 PM Peter Geoghegan wrote: > I almost think of "pin held" and "buffer lock held" as synonymous when > working on the nbtree code, even though you have this one obscure page > deletion case where that isn't quite true (plus the TID recycle safety > business imposed by he

Re: index prefetching

2024-11-11 Thread Peter Geoghegan
On Mon, Nov 11, 2024 at 12:23 PM Robert Haas wrote: > > I think that holding onto pins and whatnot has almost nothing to do > > with the index AM as such -- it's about protecting against unsafe > > concurrent TID recycling, which is a table AM/heap issue. You can make > > a rather weak argument th

Re: index prefetching

2024-11-11 Thread Robert Haas
On Sun, Nov 10, 2024 at 5:41 PM Peter Geoghegan wrote: > > It seems to me knowing which pages may be pinned is very AM-specific > > knowledge, and my intention was to let the AM to manage that. > > This is useful information, because it helps me to understand how > you're viewing this. > > I total

Re: index prefetching

2024-11-10 Thread Peter Geoghegan
On Sun, Nov 10, 2024 at 4:41 PM Tomas Vondra wrote: > Is it a good idea to make this part (in indexam.c) aware of / > responsible for managing stuff like pins? My sense is that that's the right long term architectural direction. I can't really prove it. > Perhaps it'd work fine for > index AMs t

Re: index prefetching

2024-11-10 Thread Tomas Vondra
On 11/8/24 02:35, Peter Geoghegan wrote: > On Thu, Nov 7, 2024 at 4:34 PM Tomas Vondra wrote: >> Not sure I understand, but I think I'm somewhat confused by "index AM" >> vs. indexam. Are you suggesting the individual index AMs should know as >> little about the batching as possible, and instead i

Re: index prefetching

2024-11-07 Thread Peter Geoghegan
On Thu, Nov 7, 2024 at 4:34 PM Tomas Vondra wrote: > Not sure I understand, but I think I'm somewhat confused by "index AM" > vs. indexam. Are you suggesting the individual index AMs should know as > little about the batching as possible, and instead it should be up to > indexam.c to orchestrate m

Re: index prefetching

2024-11-07 Thread Tomas Vondra
On 11/7/24 18:55, Peter Geoghegan wrote: > On Thu, Nov 7, 2024 at 10:03 AM Tomas Vondra wrote: >> The primary reason why I kept amgettuple() as is, and added a new AM >> callback for the "batch" mode is backwards compatibility. I did not want >> to force all AMs to do this, I think it should be op

Re: index prefetching

2024-11-07 Thread Peter Geoghegan
On Thu, Nov 7, 2024 at 10:03 AM Tomas Vondra wrote: > The primary reason why I kept amgettuple() as is, and added a new AM > callback for the "batch" mode is backwards compatibility. I did not want > to force all AMs to do this, I think it should be optional. Not only to > limit the disruption for

Re: index prefetching

2024-11-07 Thread Tomas Vondra
On 11/7/24 01:38, Peter Geoghegan wrote: > On Wed, Nov 6, 2024 at 12:25 PM Tomas Vondra wrote: >> Attached is an updated version of this patch series. The first couple >> parts (adding batching + updating built-in index AMs) remain the same, >> the new part is 0007 which switches index scans to re

Re: index prefetching

2024-11-06 Thread Peter Geoghegan
On Wed, Nov 6, 2024 at 12:25 PM Tomas Vondra wrote: > Attached is an updated version of this patch series. The first couple > parts (adding batching + updating built-in index AMs) remain the same, > the new part is 0007 which switches index scans to read stream API. The first thing that I notice

Re: index prefetching

2024-08-31 Thread Tomas Vondra
a couple limitations and things that need cleanup, ofc. Those are mentioned at the end of this message. The index prefetching had two prior patch versions, with very different approaches, each having different drawbacks. The first one (posted shortly before pgcon 2023) did the prefetching at

Re: index prefetching

2024-03-05 Thread Jakub Wartak
On Fri, Mar 1, 2024 at 3:58 PM Tomas Vondra wrote: [..] > TBH I don't have a clear idea what to do. It'd be cool to have at least > some benefits in v17, but I don't know how to do that in a way that > would be useful in the future. > > For example, the v20240124 patch implements this in the execu

Re: index prefetching

2024-03-01 Thread Peter Geoghegan
On Fri, Mar 1, 2024 at 10:18 AM Tomas Vondra wrote: > But I have very hard time figuring out what the MVP version should be, > because I have very limited understanding on how much control the index > AM ought to have :-( And it'd be a bit silly to do something in v17, > only to have to rip it out

Re: index prefetching

2024-03-01 Thread Tomas Vondra
On 2/15/24 21:30, Peter Geoghegan wrote: > On Thu, Feb 15, 2024 at 3:13 PM Andres Freund wrote: >>> This is why I don't think that the tuples with lower page offset >>> numbers are in any way significant here. The significant part is >>> whether or not you'll actually need to visit more than one

Re: index prefetching

2024-03-01 Thread Tomas Vondra
Hi, Thanks for looking at the patch! On 3/1/24 09:20, Jakub Wartak wrote: > On Wed, Jan 24, 2024 at 7:13 PM Tomas Vondra > wrote: > [ >> >> (1) Melanie actually presented a very different way to implement this, >> relying on the StreamingRead API. So chances are this struct won't >> actually be

Re: index prefetching

2024-03-01 Thread Jakub Wartak
On Wed, Jan 24, 2024 at 7:13 PM Tomas Vondra wrote: [ > > (1) Melanie actually presented a very different way to implement this, > relying on the StreamingRead API. So chances are this struct won't > actually be used. Given lots of effort already spent on this and the fact that is thread is actua

Re: index prefetching

2024-02-15 Thread Peter Geoghegan
On Thu, Feb 15, 2024 at 3:13 PM Andres Freund wrote: > > This is why I don't think that the tuples with lower page offset > > numbers are in any way significant here. The significant part is > > whether or not you'll actually need to visit more than one leaf page > > in the first place (plus the

Re: index prefetching

2024-02-15 Thread Andres Freund
Hi, On 2024-02-15 12:53:10 -0500, Peter Geoghegan wrote: > On Thu, Feb 15, 2024 at 12:26 PM Tomas Vondra > wrote: > > I may be missing something, but it seems fairly self-evident to me an > > entry at the beginning of an index page won't get prefetched (assuming > > the page-at-a-time thing). >

Re: index prefetching

2024-02-15 Thread Peter Geoghegan
On Thu, Feb 15, 2024 at 12:26 PM Tomas Vondra wrote: > I may be missing something, but it seems fairly self-evident to me an > entry at the beginning of an index page won't get prefetched (assuming > the page-at-a-time thing). Sure, if the first item on the page is also the first item that we nee

Re: index prefetching

2024-02-15 Thread Tomas Vondra
On 2/15/24 17:42, Peter Geoghegan wrote: > On Thu, Feb 15, 2024 at 9:36 AM Tomas Vondra > wrote: >> On 2/15/24 00:06, Peter Geoghegan wrote: >>> I suppose that it might be much more important than I imagine it is >>> right now, but it'd be nice to have something a bit more concrete to >>> go on

Re: index prefetching

2024-02-15 Thread Peter Geoghegan
On Thu, Feb 15, 2024 at 9:36 AM Tomas Vondra wrote: > On 2/15/24 00:06, Peter Geoghegan wrote: > > I suppose that it might be much more important than I imagine it is > > right now, but it'd be nice to have something a bit more concrete to > > go on. > > > > This probably depends on which corner c

Re: index prefetching

2024-02-15 Thread Tomas Vondra
On 2/15/24 00:06, Peter Geoghegan wrote: > On Wed, Feb 14, 2024 at 4:46 PM Melanie Plageman > wrote: > >> ... > > 2. Are you sure that the leaf-page-at-a-time thing is such a huge > hindrance to effective prefetching? > > I suppose that it might be much more important than I imagine it is > righ

Re: index prefetching

2024-02-14 Thread Robert Haas
the patch stands - not hold a pin on the "current" index > leaf page. Which makes index prefetching as currently implemented incompatible > with kill_prior_tuple, as that requires the index leaf page pin being held. But I think it probably also breaks MVCC, as Peter was saying. -- Robert Haas EDB: http://www.enterprisedb.com

Re: index prefetching

2024-02-14 Thread Andres Freund
have to wait for IO for the first tuples referenced by an index leaf page. However, if we want to issue table readahead for tids on the neighboring index leaf page, we'll - as the patch stands - not hold a pin on the "current" index leaf page. Which makes index prefetching as currently

Re: index prefetching

2024-02-14 Thread Robert Haas
On Wed, Feb 14, 2024 at 7:43 PM Tomas Vondra wrote: > I don't think it's just a bookkeeping problem. In a way, nbtree already > does keep an array of tuples to kill (see btgettuple), but it's always > for the current index page. So it's not that we immediately go and kill > the prior tuple - nbtre

Re: index prefetching

2024-02-14 Thread Peter Geoghegan
've seen production issues like that too. No doubt it's a problem. > We might be able to design a system where the bitmap > contains a certain number of back-references to the index, allowing later > cleanup if there weren't any page splits or such. That does seem possible, but

Re: index prefetching

2024-02-14 Thread Andres Freund
Hi, On 2024-02-13 14:54:14 -0500, Peter Geoghegan wrote: > This property of index scans is fundamental to how index scans work. > Pinning an index page as an interlock against concurrently TID > recycling by VACUUM is directly described by the index API docs [1], > even (the docs actually use term

Re: index prefetching

2024-02-14 Thread Andres Freund
Hi, On 2024-02-14 16:45:57 -0500, Melanie Plageman wrote: > > > The LIMIT problem is not very clear to me either. Yes, if we get close > > > to the end of the leaf page, we may need to visit the next leaf page. > > > But that's kinda the whole point of prefetching - reading stuff ahead, > > > and

Re: index prefetching

2024-02-14 Thread Peter Geoghegan
part working in v1? Tomas' original prototype worked with the leaf-page-at-a-time thing, and that still seemed like a big improvement to me. While being less invasive, in effect. If we can agree that something like that represents a useful step in the right direction (not an evolutionary dead end

Re: index prefetching

2024-02-14 Thread Melanie Plageman
the control flow - what part would be > > managed by the index AM? > > ISTM that prefetching for an index scan is about the index scan > itself, first and foremost. The heap accesses are usually the dominant > cost, of course, but sometimes the index leaf page accesses really do &g

Re: index prefetching

2024-02-14 Thread Melanie Plageman
On Wed, Feb 14, 2024 at 11:40 AM Melanie Plageman wrote: > > On Tue, Feb 13, 2024 at 2:01 PM Tomas Vondra > wrote: > > > > On 2/7/24 22:48, Melanie Plageman wrote: > > > ... > > > - switching scan directions > > > > > > If the index scan switches directions on a given invocation of > > > IndexNex

Re: index prefetching

2024-02-14 Thread Peter Geoghegan
On Wed, Feb 14, 2024 at 11:40 AM Melanie Plageman wrote: > I wasn't quite sure how we could use > index_compute_xid_horizon_for_tuples() for inspiration -- per Peter's > suggestion. But, I'd like to understand. The point I was trying to make with that example was: a highly generic mechanism can s

Re: index prefetching

2024-02-14 Thread Peter Geoghegan
On Wed, Feb 14, 2024 at 8:34 AM Tomas Vondra wrote: > > Another thing that argues against doing this is that we might not need > > to visit any more B-Tree leaf pages when there is a LIMIT n involved. > > We could end up scanning a whole extra leaf page (including all of its > > tuples) for want o

Re: index prefetching

2024-02-14 Thread Melanie Plageman
On Tue, Feb 13, 2024 at 2:01 PM Tomas Vondra wrote: > > On 2/7/24 22:48, Melanie Plageman wrote: > > ... Issues > > --- > > - kill prior tuple > > > > This optimization doesn't work with index prefetching with the current > > design. Kill prior

Re: index prefetching

2024-02-14 Thread Tomas Vondra
On 2/14/24 08:10, Robert Haas wrote: > On Thu, Feb 8, 2024 at 3:18 AM Melanie Plageman > wrote: >> - kill prior tuple >> >> This optimization doesn't work with index prefetching with the current >> design. Kill prior tuple relies on alternating between f

Re: index prefetching

2024-02-14 Thread Tomas Vondra
On 2/13/24 20:54, Peter Geoghegan wrote: > On Tue, Feb 13, 2024 at 2:01 PM Tomas Vondra > wrote: >> On 2/7/24 22:48, Melanie Plageman wrote: >> I admit I haven't thought about kill_prior_tuple until you pointed out. >> Yeah, prefetching separates (de-synchronizes) the two scans (index and >> heap)

Re: index prefetching

2024-02-13 Thread Robert Haas
On Thu, Feb 8, 2024 at 3:18 AM Melanie Plageman wrote: > - kill prior tuple > > This optimization doesn't work with index prefetching with the current > design. Kill prior tuple relies on alternating between fetching a > single index tuple and visiting the heap. After visiti

Re: index prefetching

2024-02-13 Thread Peter Geoghegan
On Tue, Feb 13, 2024 at 2:01 PM Tomas Vondra wrote: > On 2/7/24 22:48, Melanie Plageman wrote: > I admit I haven't thought about kill_prior_tuple until you pointed out. > Yeah, prefetching separates (de-synchronizes) the two scans (index and > heap) in a way that prevents this optimization. Or at

Re: index prefetching

2024-02-13 Thread Tomas Vondra
would need to solve in the streaming read API. Some are > issues with index prefetching generally. > > Note that these two patches have to be applied before 21d9c3ee4e > because Thomas hasn't released a rebased version of the streaming read > API patches yet. > Thanks

Re: index prefetching

2024-02-07 Thread Melanie Plageman
the regression tests > (like in the alter table test), so I suspect I have some kind of > control flow issue. Perhaps I should fix the resource leak so I can > actually see the failing tests :) Attached is a patch which implements a real queue and fixes some of the issues with the prev

Re: index prefetching

2024-01-25 Thread Tomas Vondra
On 1/25/24 11:45, Dilip Kumar wrote: > On Wed, Jan 24, 2024 at 11:43 PM Tomas Vondra > wrote: > >> On 1/22/24 07:35, Konstantin Knizhnik wrote: >>> >>> On 22/01/2024 1:47 am, Tomas Vondra wrote: h, right. Well, you're right in this case we perhaps could set just one of those flags, b

Re: index prefetching

2024-01-25 Thread Dilip Kumar
On Wed, Jan 24, 2024 at 11:43 PM Tomas Vondra wrote: > On 1/22/24 07:35, Konstantin Knizhnik wrote: > > > > On 22/01/2024 1:47 am, Tomas Vondra wrote: > >> h, right. Well, you're right in this case we perhaps could set just one > >> of those flags, but the "purpose" of the two places is quite dif

Re: index prefetching

2024-01-24 Thread Melanie Plageman
;>> bother investigating [which causes it to fail tests]). > >>> > >>> 0001 is all of Thomas' streaming read API code that isn't yet in > >>> master and 0002 is my rough sketch of index prefetching using the > >>> streaming read API &

Re: index prefetching

2024-01-24 Thread Tomas Vondra
On 1/22/24 07:35, Konstantin Knizhnik wrote: > > On 22/01/2024 1:47 am, Tomas Vondra wrote: >> h, right. Well, you're right in this case we perhaps could set just one >> of those flags, but the "purpose" of the two places is quite different. >> >> The "prefetch" flag is fully controlled by the p

Re: index prefetching

2024-01-24 Thread Tomas Vondra
On 1/22/24 08:21, Konstantin Knizhnik wrote: > > On 22/01/2024 1:39 am, Tomas Vondra wrote: >>> Why we can prefer covering index  to compound index? I see only two good >>> reasons: >>> 1. Extra columns type do not  have comparison function need for AM. >>> 2. The extra columns are never used in q

Re: index prefetching

2024-01-24 Thread Tomas Vondra
at depends entirely on the StreamingRead API queue. A lage TID queue can't cause overload of anything. What could happen is a TID queue being too small, so the prefetch can't hit the target distance. But that can happen already, e.g. indexes that are correlated and/or index-only scans w

Re: index prefetching

2024-01-23 Thread Melanie Plageman
ItemPointerData. I will work on implementing an actual queue next week. > > There are also table AM layering violations in my sketch which would > > have to be worked out (not to mention some resource leakage I didn't > > bother investigating [which causes it to fail test

  1   2   >