Re: index prefetching

2025-07-19 Thread Tomas Vondra
On 7/19/25 06:03, Thomas Munro wrote: > On Sat, Jul 19, 2025 at 6:31 AM Tomas Vondra wrote: >> Perhaps the ReadStream should do something like this? Of course, the >> simple patch resets the stream very often, likely mcuh more often than >> anything else in the code. But woul

Re: Adding basic NUMA awareness

2025-07-18 Thread Tomas Vondra
On 7/18/25 18:46, Andres Freund wrote: > Hi, > > On 2025-07-17 23:11:16 +0200, Tomas Vondra wrote: >> Here's a v2 of the patch series, with a couple changes: > > Not a deep look at the code, just a quick reply. > > >> * I changed the freelist partitio

Re: index prefetching

2025-07-18 Thread Tomas Vondra
ps the ReadStream should do something like this? Of course, the simple patch resets the stream very often, likely mcuh more often than anything else in the code. But wouldn't it be beneficial for streams reset because of a rescan? Possibly needs to be optional. regards -- Tomas Vondra From

Re: index prefetching

2025-07-18 Thread Tomas Vondra
esponsible for calling index_batch_getnext(). Isn't the batching mostly an "implementation" detail of the index AM? That's how I was thinking about it, at least. Some of these arguments could be used against the current patch, where the next_block callback is defined by executor nodes

Re: Adding basic NUMA awareness

2025-07-17 Thread Tomas Vondra
On 7/4/25 20:12, Tomas Vondra wrote: > On 7/4/25 13:05, Jakub Wartak wrote: >> ... >> >> 8. v1-0005 2x + /* if (numa_procs_interleave) */ >> >>Ha! it's a TRAP! I've uncommented it because I wanted to try it out >> without it (just

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 19:56, Tomas Vondra wrote: > On 7/16/25 18:39, Peter Geoghegan wrote: >> On Wed, Jul 16, 2025 at 11:29 AM Peter Geoghegan wrote: >>> For example, with "linear_10 / eic=16 / sync", it looks like "complex" >>> has about half the latency o

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 20:18, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 1:42 PM Tomas Vondra wrote: >> On 7/16/25 16:45, Peter Geoghegan wrote: >>> I get that index characteristics could be the limiting factor, >>> especially in a world where we're not yet eagerly

Re: index prefetching

2025-07-16 Thread Tomas Vondra
ions. If you copy the first couple lines, you'll get scans.db, with nice column names and all that.. The selectivity is calculated as (rows / total_rows) where rows is the rowcount returned by the query, and total_rows is reltuples. I also had charts with "page selectivity", but that often got a bunch of 100% points squashed on the right edge, so I stopped generating those. regards -- Tomas Vondra

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 17:29, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 4:40 AM Tomas Vondra wrote: >> For "uniform" data set, both prefetch patches do much better than master >> (for low selectivities it's clearer in the log-scale chart). The >> "complex&qu

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 16:45, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 10:37 AM Tomas Vondra wrote: >> What sounds weird? That the read_stream works like a stream of blocks, >> or that it can't do "pause" and we use "reset" as a workaround? > > The fact

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 16:29, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 10:20 AM Tomas Vondra wrote: >> The read stream can only return blocks generated by the "next" callback. >> When we return the block for the last item on a leaf page, we can only >> return "Inva

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 16:07, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 9:58 AM Tomas Vondra wrote: >>> The "simple" patch has _bt_readpage reset the read stream. That >>> doesn't make any sense to me. Though it does explain why the "complex" >>>

Re: index prefetching

2025-07-16 Thread Tomas Vondra
On 7/16/25 15:36, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 4:40 AM Tomas Vondra wrote: >> But the thing I don't really understand it the "cyclic" dataset (for >> example). And the "simple" patch performs really badly here. This data >> set

Re: AIO v2.5

2025-07-14 Thread Tomas Vondra
On 7/14/25 20:36, Andres Freund wrote: > Hi, > > On 2025-07-11 23:03:53 +0200, Tomas Vondra wrote: >> I've been running some benchmarks comparing the io_methods, to help with >> resolving this PG18 open item. So here are some results, and my brief >> analysis o

Re: AIO v2.5

2025-07-14 Thread Tomas Vondra
On 7/14/25 20:44, Andres Freund wrote: > On 2025-07-13 20:04:51 +0200, Tomas Vondra wrote: >> On 7/11/25 23:03, Tomas Vondra wrote: >>> ... >>> >>> e) indexscan regression (ryzen-indexscan-uniform-pg17-checksums.png) >>> >>> There's an

Re: index prefetching

2025-07-13 Thread Tomas Vondra
On 7/13/25 01:50, Peter Geoghegan wrote: > On Thu, May 1, 2025 at 7:02 PM Tomas Vondra wrote: >> There's two "fix" patches trying to make this work - it does not crash, >> and almost all the "incorrect" query results are actually stats about >> b

Re: Proposal: Role Sandboxing for Secure Impersonation

2025-07-13 Thread Tomas Vondra
t in an explicit transaction. If the pooler starts wrapping everything in a transaction, this would break. Not sure if this might cause some issues with "idle in transaction" sessions. Maybe not ... So I don't think the pooler should be starting transactions, unless the user actually started a transaction. regards -- Tomas Vondra

Re: Proposal: Role Sandboxing for Secure Impersonation

2025-07-13 Thread Tomas Vondra
l too. Possibly by making the SET ROLE protocol "thing" generic enough to cover this kind of use case too. The other thing is that out-of-core extensions are not great for managed systems (e.g. set_user is nice, but how many systems can use that?). While that's really a problem of providers of those systems, I wonder if we should have something like this built-in. regards [1] https://www.postgresql.org/docs/current/ddl-rowsecurity.html [2] https://github.com/tvondra/signed_context -- Tomas Vondra

Re: AIO v2.5

2025-07-13 Thread Tomas Vondra
On 7/11/25 23:03, Tomas Vondra wrote: > ... > > e) indexscan regression (ryzen-indexscan-uniform-pg17-checksums.png) > > There's an interesting difference difference I noticed in the run with > checksums on PG17. The full PDF is available here: > > https://github.co

Re: Adding basic NUMA awareness

2025-07-10 Thread Tomas Vondra
nse to partition the numBufferAllocs too, though? I don't remember if my hacky experimental patch NUMA-partitioning did that or I just thought about doing that, but why wouldn't that be enough? Places that need the "total" count would have to sum the counters, but it seemed to me most of the places would be fine with the "local" count for that partition. If we also make sure to "sync" the clocksweeps so as to not work on just a single partition, that might be enough ... regards -- Tomas Vondra

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach

2025-07-10 Thread Tomas Vondra
and all as you do is great, and combo of both approach probably > of great interest. There is also this weighted interleave discussed and > probably much more to come in this area in Linux. > > I think some points raised already about possible distinct policies, I > am precisely claiming that it is hard to come with one good policy with > limited setup options, thus requirement to keep that flexible enough > (hooks, api, 100 GUc ?). > I'm sorry, I don't want to sound too negative, but "I want arbitrary extensibility" is not a very useful feedback. I've asked you to give some examples of policies that'd customize some of the NUMA stuff. > There is an EPYC story here also, given the NUMA setup can vary > depending on BIOS setup, associated NUMA policy must probably take that > into account (L3 can be either real cache or 4 extra "local" NUMA nodes > - with highly distinct access cost from a RAM module). > Does that change how PostgreSQL will place memory and process? Is it > important or of interest ? > So how exactly would the policy handle this? Right now we're entirely oblivious to L3, or on-CPU caches in general. We don't even consider the size of L3 when sizing hash tables in a hashjoin etc. regards -- Tomas Vondra

Re: index prefetching

2025-07-09 Thread Tomas Vondra
me very helpful chats about this with Peter Geoghegan, and I'm still open to the possibility of making it work. This simpler version is partially a hedge to have at least something in case the complex patch does not make it. regards [1] https://www.postgresql.org/message-id/t5aqjhkj6xdkido535

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach

2025-07-08 Thread Tomas Vondra
ProcessRoutine(), and registered in pmroutine struct: > > pmroutine = GetPmRoutineForInitProcess(); > if (pmroutine != NULL && >     pmroutine->init_process != NULL) >     pmroutine->init_process(MyProc); > > This way it's easier to manage alternative policies, and also to be able > to adjust when hardware and linux kernel changes. > I'm not against making this extensible, in some way. But I still struggle to imagine a reasonable alternative policy, where the external module gets the same information and ends up with a different decision. So what would the alternate policy look like? What use case would the module be supporting? regards -- Tomas Vondra

Re: amcheck support for BRIN indexes

2025-07-08 Thread Tomas Vondra
On 7/8/25 14:40, Arseniy Mukhin wrote: > On Mon, Jul 7, 2025 at 3:21 PM Álvaro Herrera wrote: >> >> On 2025-Jul-07, Tomas Vondra wrote: >> >>> Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE? >>> I'd probably try

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach

2025-07-08 Thread Tomas Vondra
x27;t have any idea what exactly would the external module do, how would it decide where to place the backend. Can you describe some use case with an example? Assuming we want to actually pin tasks from within Postgres, what I think might work is allowing modules to "advise" on where to place the task. But the decision would still be done by core. regards -- Tomas Vondra

Re: Adding basic NUMA awareness

2025-07-08 Thread Tomas Vondra
On 7/8/25 05:04, Andres Freund wrote: > Hi, > > On 2025-07-04 13:05:05 +0200, Jakub Wartak wrote: >> On Tue, Jul 1, 2025 at 9:07 PM Tomas Vondra wrote: >>> I don't think the splitting would actually make some things simpler, or >>> maybe more flexible -

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach

2025-07-07 Thread Tomas Vondra
#x27;s not that much flexibility. The last bit (pinning backends to a NUMA node) is experimental, and mostly intended for easier evaluation of the earlier parts (e.g. to limit the noise when processes get moved to a CPU from a different NUMA node, and so on). regards -- Tomas Vondra

Re: amcheck support for BRIN indexes

2025-07-07 Thread Tomas Vondra
eded, so let's get rid of it. > Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE? I'd probably try to do this using the regular consistent function: (a) we don't need to add stuff to all BRIN opclasses to support this (b) it gives us additional testing of the consistent function (c) building a scan key for equality seems pretty trivial What do you think? -- Tomas Vondra

Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach

2025-07-07 Thread Tomas Vondra
On 7/5/25 09:09, Cédric Villemain wrote: > Hi Tomas, > > > I haven't yet had time to fully read all the work and proposals around > NUMA and related features, but I hope to catch up over the summer. > > However, I think it's important to share some thoughts

Re: Changing shared_buffers without restart

2025-07-07 Thread Tomas Vondra
On 7/5/25 12:35, Dmitry Dolgov wrote: >> On Fri, Jul 04, 2025 at 05:23:29PM +0200, Tomas Vondra wrote: >>>> 2) pending GUC changes >>>> >>>> Perhaps this should be a separate utility command, or maybe even just >>>> a new ALTER SYSTEM vari

Re: Adding basic NUMA awareness

2025-07-04 Thread Tomas Vondra
On 7/4/25 13:05, Jakub Wartak wrote: > On Tue, Jul 1, 2025 at 9:07 PM Tomas Vondra wrote: > > Hi! > >> 1) v1-0001-NUMA-interleaving-buffers.patch > [..] >> It's a bit more complicated, because the patch distributes both the >> blocks and descriptors,

Re: Changing shared_buffers without restart

2025-07-04 Thread Tomas Vondra
On 7/4/25 16:41, Dmitry Dolgov wrote: >> On Fri, Jul 04, 2025 at 02:06:16AM +0200, Tomas Vondra wrote: >> I took a look at this patch, because it's somewhat related to the NUMA >> patch series I posted a couple days ago, and I've been wondering if >> it make

Re: Changing shared_buffers without restart

2025-07-03 Thread Tomas Vondra
tps://www.postgresql.org/message-id/CA%2BhUKGL5hW3i_pk5y_gcbF_C5kP-pWFjCuM8bAyCeHo3xUaH8g%40mail.gmail.com [3] https://www.postgresql.org/message-id/12add41a-7625-4639-a394-a5563e349322%40eisentraut.org [4] https://www.postgresql.org/message-id/CA%2BTgmoZFfn0E%2BEkUAjnv_QM_00eUJPkgCJKzm3n1G4itJKMS

Re: Adding basic NUMA awareness

2025-07-02 Thread Tomas Vondra
On 7/2/25 13:37, Ashutosh Bapat wrote: > On Wed, Jul 2, 2025 at 12:37 AM Tomas Vondra wrote: >> >> >> 3) v1-0003-freelist-Don-t-track-tail-of-a-freelist.patch >> >> Minor optimization. Andres noticed we're tracking the tail of buffer >> freelist,

Re: Add os_page_num to pg_buffercache

2025-07-01 Thread Tomas Vondra
On 7/1/25 19:20, Bertrand Drouvot wrote: > Hi, > > On Tue, Jul 01, 2025 at 06:45:37PM +0200, Tomas Vondra wrote: >> On 7/1/25 18:34, Bertrand Drouvot wrote: >> >> But isn't the _numa view good enough for this? Sure, you need NUMA >> support for it, and it m

Re: NUMA shared memory interleaving

2025-07-01 Thread Tomas Vondra
Hi Jakub, FYI I've posted my experimental NUMA patch series here: https://www.postgresql.org/message-id/099b9433-2855-4f1b-b421-d078a5d82017%40vondra.me I've considered posting it to this thread, but it seemed sufficiently different to start a new thread. regards -- Tomas Vondra

Adding basic NUMA awareness

2025-07-01 Thread Tomas Vondra
one, but I guess it'd require a bit of code invoked sometime after the resize. It'd already need to rebuild the freelists in some way, I guess. The other thing I haven't thought about very much is determining on which CPUs/nodes the instance is allowed to run. I assume we'd s

Re: Add os_page_num to pg_buffercache

2025-07-01 Thread Tomas Vondra
On 7/1/25 18:34, Bertrand Drouvot wrote: > Hi, > > On Tue, Jul 01, 2025 at 04:31:01PM +0200, Tomas Vondra wrote: >> On 7/1/25 15:45, Bertrand Drouvot wrote: >> >> I took a quick look on this, > > Thanks for looking at it! > >> and I doubt we want to c

Re: No error checking when reading from file using zstd in pg_dump

2025-07-01 Thread Tomas Vondra
't as assuring as it should be since there is a lack > of testcoverage). > Could you elaborate what you mean by lack of test coverage? Doesn't pg_dump have TAP tests exercising all compression methods? Perhaps it does not exercise all parts of the code, and we could improve that? regards -- Tomas Vondra

Re: Add os_page_num to pg_buffercache

2025-07-01 Thread Tomas Vondra
On 7/1/25 15:45, Bertrand Drouvot wrote: > Hi, > > On Thu, Apr 10, 2025 at 03:05:29PM +, Bertrand Drouvot wrote: >> Hi, >> >> On Thu, Apr 10, 2025 at 09:58:18AM -0500, Nathan Bossart wrote: >>> On Thu, Apr 10, 2025 at 03:35:24PM +0200, Tomas Vondra wrote:

Re: NUMA shared memory interleaving

2025-07-01 Thread Tomas Vondra
On 7/1/25 11:04, Jakub Wartak wrote: > On Mon, Jun 30, 2025 at 9:23 PM Tomas Vondra wrote: >> >> I wasn't suggesting to do "numactl --interleave=all". My argument was >> simply that doing numa_interleave_memory() has most of the same issues, >> beca

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-07-01 Thread Tomas Vondra
On 7/1/25 06:06, Bertrand Drouvot wrote: > Hi, > > On Mon, Jun 30, 2025 at 08:56:43PM +0200, Tomas Vondra wrote: >> In particular it now uses "chunking" instead of "batching". I believe >> bathing is "combining multiple requests into a single

Re: NUMA shared memory interleaving

2025-06-30 Thread Tomas Vondra
On 6/30/25 12:55, Jakub Wartak wrote: > Hi Tomas! > > On Fri, Jun 27, 2025 at 6:41 PM Tomas Vondra wrote: > >> I agree we should improve the behavior on NUMA systems. But I'm not sure >> this patch does far enough, or perhaps the approach seems a bit too >&g

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-30 Thread Tomas Vondra
On 6/27/25 19:33, Bertrand Drouvot wrote: > Hi, > > On Fri, Jun 27, 2025 at 04:52:08PM +0200, Tomas Vondra wrote: >> Here's three small patches, that should handle the issue > > Thanks for the patches! > >> 0001 - Adds the batching into pg_numa_query_pages, s

Re: NUMA shared memory interleaving

2025-06-27 Thread Tomas Vondra
igure that from Postgres? Aren't people likely to already use something like containers or k8 anyway? I think we should just try to inherit this from the environment, i.e. determine which nodes we're allowed to run, and use that. Maybe we'll find we need to be smarter, but I think we caan leave that for later. regards -- Tomas Vondra

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-27 Thread Tomas Vondra
ng can take quite a bit of time, so letting people to interrupt it seems reasonable. It wasn't possible with just one call into the kernel, but with the batching we can add a CFI. Please, take a look. regards -- Tomas Vondra From 3d935f62665a18d96e6bec59cb1f3f7cd7daa068 Mon Sep 17 00:00:00 20

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-26 Thread Tomas Vondra
On 6/26/25 08:00, Bertrand Drouvot wrote: > Hi, > > On Tue, Jun 24, 2025 at 10:32:25PM +0200, Tomas Vondra wrote: >> On 6/24/25 17:30, Christoph Berg wrote: >>> Re: Tomas Vondra >>>> If it's a reliable fix, then I guess we can do it like this. But wo

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-25 Thread Tomas Vondra
On 6/24/25 10:24, Bertrand Drouvot wrote: > Hi, > > On Tue, Jun 24, 2025 at 03:43:19AM +0200, Tomas Vondra wrote: >> On 6/23/25 23:47, Tomas Vondra wrote: >>> ... >>> >>> Or maybe the 32-bit chroot on 64-bit host matters and confuses some >>> c

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-25 Thread Tomas Vondra
On 6/25/25 14:42, Álvaro Herrera wrote: > On 2025-Jun-25, Tomas Vondra wrote: > >> Not sure. I thought NUMA doesn't matter very much on 32-bit systems too, >> exactly because those systems tend to use small amounts of memory. But >> then while investigating this i

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-25 Thread Tomas Vondra
> > I was also missing it in my suggested patch draft, but this should > probably include #ifdef __linux__. > > > Re: Tomas Vondra >> +#ifdef USE_VALGRIND >> + >> +static inline void >> +pg_numa_touch_mem_if_required(uint64 tmp, char *ptr) > > St

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-25 Thread Tomas Vondra
On 6/25/25 09:15, Jakub Wartak wrote: > On Tue, Jun 24, 2025 at 5:30 PM Christoph Berg wrote: >> >> Re: Tomas Vondra >>> If it's a reliable fix, then I guess we can do it like this. But won't >>> that be a performance penalty on everyone? Or does the

Re: Remove unneeded check for XLH_INSERT_ALL_FROZEN in heap_xlog_insert

2025-06-24 Thread Tomas Vondra
atch this, even if it's ultimately harmless, just to keep the code not confusing. regards -- Tomas Vondra

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra
On 6/24/25 17:30, Christoph Berg wrote: > Re: Tomas Vondra >> If it's a reliable fix, then I guess we can do it like this. But won't >> that be a performance penalty on everyone? Or does the system split the >> array into 16-element chunks anyway, so this makes no

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra
ant to rely too much on my >>> interpretation of it. >> >> I don't have that much experience too but I think the issue is in >> do_pages_stat() >> and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) >> instead. > > Me neither, but I'll try submit this fix. > +1 Thanks to both of you for the report and the investigation. regards -- Tomas Vondra

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra
On 6/24/25 13:10, Andres Freund wrote: > Hi, > > On 2025-06-24 03:43:19 +0200, Tomas Vondra wrote: >> FWIW while looking into this, I tried running this under valgrind (on a >> regular 64-bit system, not in the chroot), and I get this report: >> >> ==65065==

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra
On 6/24/25 13:10, Bertrand Drouvot wrote: > Hi, > > On Tue, Jun 24, 2025 at 11:20:15AM +0200, Tomas Vondra wrote: >> On 6/24/25 10:24, Bertrand Drouvot wrote: >>> Yeah, same for me with pg_get_shmem_allocations_numa(). It works if >>> pg_numa_query_pages()

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-23 Thread Tomas Vondra
On 6/23/25 23:47, Tomas Vondra wrote: > ... > > Or maybe the 32-bit chroot on 64-bit host matters and confuses some > calculation. > I think it's likely something like this. I noticed that if I modify pg_buffercache_numa_pages() to query the addresses one by one, it works.

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-23 Thread Tomas Vondra
t; LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:394 > > Repeated calls are fine. > Huh. So it's only the first call that does this? Can you maybe print the addresses passed to pg_numa_query_pages? I wonder if there's some bug in how we fill that array. Not sure why would it happen only on 32-bit systems, though. I'll create a 32-bit VM so that I can try reproducing this. regards -- Tomas Vondra

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-23 Thread Tomas Vondra
On 6/23/25 23:25, Christoph Berg wrote: > Re: Tomas Vondra >> True. If it fails on first call, but succeeds on the other, then the >> problem is likely somewhere else. But also on the second call we won't >> do the memory touching. Can you try setting firstNumaTouch=fa

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-23 Thread Tomas Vondra
On 6/23/25 22:31, Christoph Berg wrote: > Re: Tomas Vondra >> Huh. So it's only the first call that does this? > > The first call after a restart. Reconnecting is not enough. > >> Can you maybe print the addresses passed to pg_numa_query_pages? I > > The a

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-23 Thread Tomas Vondra
On 6/23/25 22:51, Christoph Berg wrote: > Re: Tomas Vondra >> Didn't you say the first ~35 addresses succeed, right? What about the >> addresses after that? > > That was pg_shmem_allocations_numa. The pg_numa_query_pages() in there > works (does not return -1)

Re: Amcheck verification of GiST and GIN

2025-06-17 Thread Tomas Vondra
On 6/17/25 16:19, Thom Brown wrote: > On Mon, 16 Jun 2025 at 21:00, Tomas Vondra wrote: >> >> On 6/16/25 21:09, Arseniy Mukhin wrote: >>> On Mon, Jun 16, 2025 at 6:58 PM Tomas Vondra wrote: >>>> >>>> Thanks. >>>> >>>&g

Re: Avoid possible dereference null pointer (src/backend/utils/cache/relcache.c)

2025-06-17 Thread Tomas Vondra
it will probably work fine. The catalog is borked, and who knows in what way. My opinion is that adding a "elog(ERROR)" here would be misleading, as it implies it's something we expect. And mostly pointless. I can imagine adding an Assert, but I don't quite see how is that better than just hitting a segfault a couple lines later. regards -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-06-16 Thread Tomas Vondra
On 6/16/25 21:09, Arseniy Mukhin wrote: > On Mon, Jun 16, 2025 at 6:58 PM Tomas Vondra wrote: >> >> Thanks. >> >> I went through the patches, polished the commit messages and did some >> minor tweaks in patch 0002 (to make the variable names a bit more >> co

Re: No error checking when reading from file using zstd in pg_dump

2025-06-16 Thread Tomas Vondra
ommits this week, but considering I missed the issues before commit ... For a moment I was worried about breaking ABI when fixing this in the backbranches, but I guess that's not an issue for tools like pg_dump. regards -- Tomas Vondra

Re: No error checking when reading from file using zstd in pg_dump

2025-06-16 Thread Tomas Vondra
uced this API, but it's definitely the case it was based on the initial gzip code. Regarding the Z_NULL, I believe it has always been ignored like this, at least since 9.1. The code simply returns what gzgets() returns, and then compares that to NULL, etc. Is there there's a better way to deal with Z_NULL? I suppose we could explicitly check/translate Z_NULL to NULL, although Z_NULL is simply defined as 0. I don't recall if NULL has some additional magic. regards -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-06-16 Thread Tomas Vondra
read through the commit messages, and let me know if I got some of the details wrong (or not clear enough). Otherwise I plan to start pushing this soon (~tomorrow). regards -- Tomas VondraFrom cb24bb068582a39df9e9e59c2a9347889e896cf2 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 9 Jun

Re: Improve CRC32C performance on SSE4.2

2025-06-14 Thread Tomas Vondra
On 6/14/25 15:56, Nathan Bossart wrote: > On Sat, Jun 14, 2025 at 03:47:33PM +0200, Tomas Vondra wrote: >> I suggest you try with a newer gcc, perhaps 13.4. There's been a bunch >> of fixes related to AVX512 since 13.0, chances are this was already >> fixed. I don&#x

Re: Handling OID Changes in Regression Tests for C Extensions

2025-06-14 Thread Tomas Vondra
gt; The OIDs for user-defined objects (e.g. those from extensions) are not stable, and this will not change. The only way is to prevent the test output, e.g. by not including OIDs in the results, and eliminating all other types of non-determinism - eg. by enforcing ordering, etc. regards -- Tomas Vondra

Re: Improve CRC32C performance on SSE4.2

2025-06-14 Thread Tomas Vondra
gt; the current master, everything is fine. Does anyone knows the reason? > > The attached is my config.log. > > > -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-06-09 Thread Tomas Vondra
On 6/9/25 00:14, Tomas Vondra wrote: > ... > > I propose to split it like this, into three parts, each addressing a > particular type of mistake: > > 1) gin_check_posting_tree_parent_keys_consistency > > 2) gin_check_parent_keys_consistency

Re: amcheck support for BRIN indexes

2025-06-08 Thread Tomas Vondra
hink these tests are > portable. While writing tests some minor issues were found and fixed. > Also ci compiler warnings were fixed. > Thanks. I've added myself as a reviewer, so that I don't forget about this for the next CF. regards -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-06-08 Thread Tomas Vondra
On 5/29/25 13:53, Arseniy Mukhin wrote: > On Mon, May 26, 2025 at 7:28 PM Arseniy Mukhin > wrote: >> On Mon, May 26, 2025 at 1:27 PM Tomas Vondra wrote: >>> Also, I've noticed that the TAP test passes even with some (most) of the >>> verify_gin.c changes rever

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

2025-06-04 Thread Tomas Vondra
On 6/4/25 19:59, Jim Nasby wrote: > > > On Fri, May 23, 2025 at 4:29 PM Tomas Vondra <mailto:to...@vondra.me>> wrote: > > Also, Alvaro seemed to think TAM is the way to go, and in order to keep > the OLTP performance he suggested to use both heap and VCI

Re: strange perf regression with data checksums

2025-06-04 Thread Tomas Vondra
son, and treated it as "normal". But with the default changes, it'll be easier to spot once they upgrade to PG18. So better to get this in now, otherwise we may have to wait until PG19, because of ABI (the patch adds a field into BTScanPosData, but maybe it'd be possible to add it into padding, not sure). regards -- Tomas Vondra

Re: [PING] fallocate() causes btrfs to never compress postgresql files

2025-05-31 Thread Tomas Vondra
ow which ones to set, a lot of the knowledge is somewhat outdated I think. Wouldn't it be better for btrfs to just start returning EOPNOTSUPP (maybe with a mount option), in which case we already do the right thing automatically already? Sure, it means the admin needs to be aware of this in both cases. regards -- Tomas Vondra

Re: [PING] fallocate() causes btrfs to never compress postgresql files

2025-05-28 Thread Tomas Vondra
efully will > not affect postgres (see CAVEATS in man 3 posix_fallocate). > Well, if btrfs starts returning EOPNOTSUPP, and glibc switches to the userspace fallback, we wouldn't notice. But that's up to the btrfs to decide if they want to support fallocate. We still need our fallback anyway, because of other OSes. regards -- Tomas Vondra

Re: Non-reproducible AIO failure

2025-05-27 Thread Tomas Vondra
u run these tests in parallel. Can you share the patch/script? thank -- Tomas Vondra

Re: Amcheck verification of GiST and GIN

2025-05-26 Thread Tomas Vondra
e the TAP test to trigger this too? To show the current code (in master) misses this? Grigory, Andrey, Heikki, any opinions on the tweaks? regards -- Tomas Vondra From 973de3eaeeca7ff2946a5b0f92f481d70ba5b78d Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 26 May 2025 12:10:37 +0200 Subje

Re: Hash table scans outside transactions

2025-05-25 Thread Tomas Vondra
that break the seqscan? FWIW I think with the use case from the beginning of this thread: 1. Add/update/remove entries in hash table 2. Scan the existing entries and perform one transaction per entry 3. Close scan Why not to simply build a linked list after step (1)? regards -- Tomas Vondra

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

2025-05-23 Thread Tomas Vondra
r.c:115:28: error: assignment to ‘ExecutorStart_hook_type’ {aka ‘void (*)(QueryDesc *, int)’} from incompatible pointer type ‘_Bool (*)(QueryDesc *, int)’ [-Wincompatible-pointer-types] 115 | ExecutorStart_hook = vci_executor_start_routine; |^ executor/vci_executor.c: In function ‘vci_executor_start_routine’: executor/vci_executor.c:161:28: error: void value not ignored as it ought to be 161 | plan_valid = executor_start_prev(queryDesc, eflags); |^ executor/vci_executor.c:163:28: error: void value not ignored as it ought to be 163 | plan_valid = standard_ExecutorStart(queryDesc, eflags); |^ make: *** [../../src/Makefile.global:973: executor/vci_executor.o] Error 1 The extension is not added to contrib/Makefile, so "make world" does not trigger this failure. regards -- Tomas Vondra

Re: Enable data checksums by default

2025-05-23 Thread Tomas Vondra
xisting tooling? I mean, there's pretty much just one thing the user can do to make it work, and that's disabling checksums. Sure, they might also enable checksums on the old cluster, but that makes the upgrade much longer, and presumably they use pg_upgrade to upgrade quickly. That being said, I don't feel very strongly about this, so if the consensus is to just error-out, so be it. regards -- Tomas Vondra

Re: Enable data checksums by default

2025-05-23 Thread Tomas Vondra
Isn't the whole point of that change to keep the current workflow working? Also, I'm not sure if "no feedback about this" is reliable. I have no clue if people did any significant testing. Maybe people did a lot of testing and the current state is fine. But it's more likely there was little testing, in which case "no feedback" says nothing. FWIW I would be +0.5 to just let pg_upgrade disable checksums. regards -- Tomas Vondra

Re: generic plans and "initial" pruning

2025-05-22 Thread Tomas Vondra
OK with that in principle, assuming the benefits outweigh the risk of making backpatching harder. The patches don't seem exceptionally large / invasive, but I don't know how often we modify these parts. regards -- Tomas Vondra

Re: plan shape work

2025-05-20 Thread Tomas Vondra
uot;why was the index not used", and the possible answers include "dominated by cost by another path" or "does not match the index keys" etc. I wonder if this work might be useful for something like that. regards -- Tomas Vondra

Re: Please update the pgconf.dev Unconference notes

2025-05-20 Thread Tomas Vondra
ved too quickly in different directions for me to catch all the details, so the notes have gaps etc. If others can improve that / clarify, that'd be great. regards -- Tomas Vondra

Re: generic plans and "initial" pruning

2025-05-20 Thread Tomas Vondra
ts > seem to be reality. The second attached file is a test case that > triggers > > ... FYI I added this as a PG18 open item: https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items regards -- Tomas Vondra

Re: wrong query results on bf leafhopper

2025-05-20 Thread Tomas Vondra
good to kick this one out the pool if there's hardware issues. > There are tools like "stress" and "stressant", etc. Works on my rpi5, but depends on the packager. I'd probably just look at dmesg first. In my experience hardware issues are often pretty visible there - reports of failed I/O requests, thermal issues on the CPU, that kind of stuff. regards -- Tomas Vondra

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
On 5/19/25 22:29, Peter Geoghegan wrote: > On Mon, May 19, 2025 at 4:17 PM Tomas Vondra wrote: >> Same effect as v1 for IOS, with regular index scans I see this: >> >> 64 clients: 0.7M tps >> 96 clients: 1.5M tps >> >> So very similar improvement as for IO

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
On 5/19/25 20:44, Peter Geoghegan wrote: > On Mon, May 19, 2025 at 2:19 PM Peter Geoghegan wrote: >> On Mon, May 19, 2025 at 2:01 PM Tomas Vondra wrote: >>> The regular index scan however still have this issue, although it's not >>> as visible as for IOS. >

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
mentioned maybe we could add an atomic variable tracking the page LSN, so that we don't have to obtain the header lock. I didn't have time to try yet. regards -- Tomas Vondra

Re: strange perf regression with data checksums

2025-05-19 Thread Tomas Vondra
dr, buf_state); AFAICS the lock is needed simply to read a consistent value from the page header, but maybe we could have an atomic variable with a copy of the LSN in the buffer descriptor? regards -- Tomas Vondra | --91.21%--btgettuple

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-11 Thread Tomas Vondra
On 5/11/25 18:07, Peter Geoghegan wrote: > On Sat, May 10, 2025 at 10:59 AM Tomas Vondra wrote: >> But doesn't it also highlight how fragile this memory allocation is? The >> skip scan patch didn't do anything wrong - it just added a couple >> fields, using a lit

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-10 Thread Tomas Vondra
ibc libraries). Still, it's a long-standing behavior, and I doubt it's likely to change. But considering glibc is what most systems use, maybe we should add some protections? I recall there were proposals to add optional mallopt() call to set the M_TOP_PAD when running on glibc. Maybe we should revive that. I also had a patch to add a "memory pool", which fixed this as a side effect. regards -- Tomas Vondra results.pdf Description: Adobe PDF document

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
> Yes. Have you tried reproducing the issue? It'd be good if someone else reproduced this independently, to confirm I'm not hallucinating. > @Tomas > Given the impact of MALLOC_TOP_PAD_, have you tested with other values > of MALLOC_TOP_PAD_? > I tried, and it seems 4MB

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
On 5/9/25 18:36, Peter Geoghegan wrote: > On Fri, May 9, 2025 at 12:28 PM Tomas Vondra wrote: >> Not sure if it matters, but this uses index-only scans, and the pages >> are all-visible, so maybe it's not much more expensive. > > You're still going to have to s

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
tine to nbtree was. It does not remove skip scan itself (that > should still work with queries that are actually eligible to use skip > scan, albeit slightly less efficiently with some opclasses). > Tried, doesn't seem to affect the results at all. -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
On 5/9/25 17:55, Peter Geoghegan wrote: > On Fri, May 9, 2025 at 10:57 AM Tomas Vondra wrote: >> I see the regression even with variants that actually match some rows. >> For example if I do this: > >> so that the query matches 100 rows, I get the same behavior. > &

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
n with variants that actually match some rows. For example if I do this: update pgbench_accounts set bid = aid; vacuum full; and change the query to search for "bid = 1", I get exactly the same behavior. Even with update pgbench_accounts set bid = aid / 100; vacuum full; so that the query matches 100 rows, I get the same behavior. -- Tomas Vondra

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-09 Thread Tomas Vondra
On 5/9/25 16:17, Peter Geoghegan wrote: > On Fri, May 9, 2025 at 8:58 AM Tomas Vondra wrote: >> I'm also not sure about the root cause, but while investigating it one >> of the experiments I tried was tweaking the glibc malloc by setting >> >> export

  1   2   3   4   5   6   7   8   9   10   >