Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Robert Haas
On Thu, Jul 29, 2021 at 5:11 AM Masahiko Sawada wrote: > Indeed. Given that the radix tree itself has other use cases, I have > no concern about using radix tree for vacuum's dead tuples storage. It > will be better to have one that can be generally used and has some > optimizations that are helpf

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Yura Sokolov
Yura Sokolov писал 2021-07-29 18:29: I've attached IntegerSet2 patch for pgtools repo and benchmark results. Branch https://github.com/funny-falcon/pgtools/tree/integerset2 Strange web-mail client... I never can be sure what it will attach... Reattach benchmark results regards, Yura Sokol

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Yura Sokolov
Masahiko Sawada писал 2021-07-29 17:29: On Thu, Jul 29, 2021 at 8:03 PM Yura Sokolov wrote: Masahiko Sawada писал 2021-07-29 12:11: > On Thu, Jul 29, 2021 at 3:53 AM Andres Freund > wrote: >> >> Hi, >> >> On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote: >> > Apart from performance and mem

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Masahiko Sawada
On Thu, Jul 29, 2021 at 8:03 PM Yura Sokolov wrote: > > Masahiko Sawada писал 2021-07-29 12:11: > > On Thu, Jul 29, 2021 at 3:53 AM Andres Freund > > wrote: > >> > >> Hi, > >> > >> On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote: > >> > Apart from performance and memory usage points of view,

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Yura Sokolov
Masahiko Sawada писал 2021-07-29 12:11: On Thu, Jul 29, 2021 at 3:53 AM Andres Freund wrote: Hi, On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote: > Apart from performance and memory usage points of view, we also need > to consider the reusability of the code. When I started this thread, I

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-29 Thread Masahiko Sawada
On Thu, Jul 29, 2021 at 3:53 AM Andres Freund wrote: > > Hi, > > On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote: > > Apart from performance and memory usage points of view, we also need > > to consider the reusability of the code. When I started this thread, I > > thought the best data struct

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-28 Thread Andres Freund
Hi, On 2021-07-27 13:06:56 +0900, Masahiko Sawada wrote: > Apart from performance and memory usage points of view, we also need > to consider the reusability of the code. When I started this thread, I > thought the best data structure would be the one optimized for > vacuum's dead tuple storage. H

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-28 Thread Andres Freund
Hi, On 2021-07-25 19:07:18 +0300, Yura Sokolov wrote: > I've dreamed to write more compact structure for vacuum for three > years, but life didn't give me a time to. > > Let me join to friendly competition. > > I've bet on HATM approach: popcount-ing bitmaps for non-empty elements. My concern

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-26 Thread Yura Sokolov
Masahiko Sawada писал 2021-07-27 07:06: On Mon, Jul 26, 2021 at 11:01 PM Masahiko Sawada wrote: I'll experiment with the proposed ideas including this idea in more scenarios and share the results tomorrow. I've done some benchmarks for proposed data structures. In this trial, I've done with

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-26 Thread Masahiko Sawada
On Mon, Jul 26, 2021 at 11:01 PM Masahiko Sawada wrote: > > I'll experiment with the proposed ideas including this idea in more > scenarios and share the results tomorrow. > I've done some benchmarks for proposed data structures. In this trial, I've done with the scenario where dead tuples are co

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-26 Thread Masahiko Sawada
On Mon, Jul 26, 2021 at 1:07 AM Yura Sokolov wrote: > > Hi, > > I've dreamed to write more compact structure for vacuum for three > years, but life didn't give me a time to. > > Let me join to friendly competition. > > I've bet on HATM approach: popcount-ing bitmaps for non-empty elements. Thank

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-25 Thread Yura Sokolov
Hi, I've dreamed to write more compact structure for vacuum for three years, but life didn't give me a time to. Let me join to friendly competition. I've bet on HATM approach: popcount-ing bitmaps for non-empty elements. Novelties: - 32 consecutive pages are stored together in a single sparse

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-19 Thread Andres Freund
Hi, On 2021-07-19 16:49:15 -0700, Andres Freund wrote: > E.g. for > > select prepare( > 100, -- max block > 20, -- # of dead tuples per page > 10, -- dead tuples interval within a page > 1 -- page inteval > ); > attach sizeshuffled ordered > array69 ms 120 MB 84.87 s

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-19 Thread Andres Freund
Hi, On 2021-07-19 15:20:54 +0900, Masahiko Sawada wrote: > BTW is the implementation of the radix tree approach available > somewhere? If so I'd like to experiment with that too. > > > > > I have toyed with implementing adaptively large radix nodes like > > proposed in https://db.in.tum.de/~leis/p

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-18 Thread Masahiko Sawada
Sorry for the late reply. On Sat, Jul 10, 2021 at 11:55 AM Andres Freund wrote: > > Hi, > > On 2021-07-09 10:17:49 -0700, Andres Freund wrote: > > On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote: > > > Currently, the TIDs of dead tuples are stored in an array that is > > > collectively alloca

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-10 Thread Masahiko Sawada
On Sat, Jul 10, 2021 at 2:17 AM Andres Freund wrote: > > Hi, > > On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote: > > Currently, the TIDs of dead tuples are stored in an array that is > > collectively allocated at the start of lazy vacuum and TID lookup uses > > bsearch(). There are the follow

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-09 Thread Andres Freund
Hi, On 2021-07-09 10:17:49 -0700, Andres Freund wrote: > On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote: > > Currently, the TIDs of dead tuples are stored in an array that is > > collectively allocated at the start of lazy vacuum and TID lookup uses > > bsearch(). There are the following chal

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-09 Thread Andres Freund
Hi, On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote: > Currently, the TIDs of dead tuples are stored in an array that is > collectively allocated at the start of lazy vacuum and TID lookup uses > bsearch(). There are the following challenges and limitations: > So I prototyped a new data struc

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-09 Thread Masahiko Sawada
On Thu, Jul 8, 2021 at 10:40 PM Hannu Krosing wrote: > > Very nice results. > > I have been working on the same problem but a bit different solution - > a mix of binary search for (sub)pages and 32-bit bitmaps for > tid-in-page. > > Even with currebnt allocation heuristics (allocate 291 tids per p

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-09 Thread Masahiko Sawada
On Thu, Jul 8, 2021 at 7:51 AM Peter Geoghegan wrote: > > On Wed, Jul 7, 2021 at 1:24 PM Peter Geoghegan wrote: > > I wonder how much it would help to break up that loop into two loops. > > Make the callback into a batch operation that generates state that > > describes what to do with each and e

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-09 Thread Masahiko Sawada
On Fri, Jul 9, 2021 at 2:37 PM Andres Freund wrote: > > Hi, > > On 2021-07-08 20:53:32 -0700, Andres Freund wrote: > > On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote: > > > 1. Don't allocate more than 1GB. There was a discussion to eliminate > > > this limitation by using MemoryContextAllocHu

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-08 Thread Masahiko Sawada
On Fri, Jul 9, 2021 at 12:53 PM Andres Freund wrote: > > Hi, > > > On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote: > > 1. Don't allocate more than 1GB. There was a discussion to eliminate > > this limitation by using MemoryContextAllocHuge() but there were > > concerns about point 2[1]. > > >

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-08 Thread Andres Freund
Hi, On 2021-07-08 20:53:32 -0700, Andres Freund wrote: > On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote: > > 1. Don't allocate more than 1GB. There was a discussion to eliminate > > this limitation by using MemoryContextAllocHuge() but there were > > concerns about point 2[1]. > > > > 2. Allo

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-08 Thread Andres Freund
Hi, On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote: > 1. Don't allocate more than 1GB. There was a discussion to eliminate > this limitation by using MemoryContextAllocHuge() but there were > concerns about point 2[1]. > > 2. Allocate the whole memory space at once. > > 3. Slow lookup perfor

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-08 Thread Hannu Krosing
On Fri, Jul 9, 2021 at 12:34 AM Peter Geoghegan wrote: > ... > > I would say that 200 TIDs per leaf page is common and ~1350 TIDs per > leaf page is not uncommon (with deduplication). Seems like that might > be enough? Likely yes, and also it would have the nice property of not changing the index

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-08 Thread Peter Geoghegan
On Thu, Jul 8, 2021 at 1:53 PM Hannu Krosing wrote: > How I am approaching this is separating "page search" tyo run over a > (naturally) sorted array of 32 bit page pointers and only when the > page is found the indexes in this array are used to look up the > in-page bitmaps. > This allows the hea

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-08 Thread Peter Geoghegan
On Thu, Jul 8, 2021 at 1:47 AM Masahiko Sawada wrote: > As I wrote in the first email, I think there are two important factors > in index vacuuming performance: the performance to check if heap TID > that an index tuple points to is dead, and the number of times to > perform index bulk-deletion. T

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-08 Thread Hannu Krosing
Resending as forgot to send to the list (thanks Peter :) ) On Wed, Jul 7, 2021 at 10:24 PM Peter Geoghegan wrote: > > The loop inside btvacuumpage() makes each loop iteration call the > callback -- this is always a call to lazy_tid_reaped() in practice. > And that's where we do binary searches. T

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-08 Thread Hannu Krosing
Very nice results. I have been working on the same problem but a bit different solution - a mix of binary search for (sub)pages and 32-bit bitmaps for tid-in-page. Even with currebnt allocation heuristics (allocate 291 tids per page) it initially allocate much less space, instead of current 291*6

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-08 Thread Masahiko Sawada
On Thu, Jul 8, 2021 at 5:24 AM Peter Geoghegan wrote: > > On Wed, Jul 7, 2021 at 4:47 AM Masahiko Sawada wrote: > > Currently, the TIDs of dead tuples are stored in an array that is > > collectively allocated at the start of lazy vacuum and TID lookup uses > > bsearch(). There are the following c

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-07 Thread Masahiko Sawada
On Wed, Jul 7, 2021 at 11:25 PM Matthias van de Meent wrote: > > On Wed, 7 Jul 2021 at 13:47, Masahiko Sawada wrote: > > > > Hi all, > > > > Index vacuuming is one of the most time-consuming processes in lazy > > vacuuming. lazy_tid_reaped() is a large part among them. The attached > > the flame

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-07 Thread Peter Geoghegan
On Wed, Jul 7, 2021 at 1:24 PM Peter Geoghegan wrote: > I wonder how much it would help to break up that loop into two loops. > Make the callback into a batch operation that generates state that > describes what to do with each and every index tuple on the leaf page. > The first loop would build a

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-07 Thread Peter Geoghegan
On Wed, Jul 7, 2021 at 4:47 AM Masahiko Sawada wrote: > Currently, the TIDs of dead tuples are stored in an array that is > collectively allocated at the start of lazy vacuum and TID lookup uses > bsearch(). There are the following challenges and limitations: > > 1. Don't allocate more than 1GB. T

Re: [PoC] Improve dead tuple storage for lazy vacuum

2021-07-07 Thread Matthias van de Meent
On Wed, 7 Jul 2021 at 13:47, Masahiko Sawada wrote: > > Hi all, > > Index vacuuming is one of the most time-consuming processes in lazy > vacuuming. lazy_tid_reaped() is a large part among them. The attached > the flame graph shows a profile of a vacuum on a table that has one index > and 80 milli

[PoC] Improve dead tuple storage for lazy vacuum

2021-07-07 Thread Masahiko Sawada
Hi all, Index vacuuming is one of the most time-consuming processes in lazy vacuuming. lazy_tid_reaped() is a large part among them. The attached the flame graph shows a profile of a vacuum on a table that has one index and 80 million live rows and 20 million dead rows, where lazy_tid_reaped() acc

<    1   2   3   4   5