Re: O(n) tasks cause lengthy startups and checkpoints

2023-07-04 Thread Nathan Bossart
On Tue, Jul 04, 2023 at 09:30:43AM +0200, Daniel Gustafsson wrote: >> On 4 Apr 2023, at 05:36, Nathan Bossart wrote: >> >> I sent this one to the next commitfest and marked it as waiting-on-author >> and targeted for v17. I'm aiming to have something that addresses the >> latest feedback ready

Re: O(n) tasks cause lengthy startups and checkpoints

2023-07-04 Thread Daniel Gustafsson
> On 4 Apr 2023, at 05:36, Nathan Bossart wrote: > > I sent this one to the next commitfest and marked it as waiting-on-author > and targeted for v17. I'm aiming to have something that addresses the > latest feedback ready for the July commitfest. Have you had a chance to look at this such

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-03 Thread Nathan Bossart
I sent this one to the next commitfest and marked it as waiting-on-author and targeted for v17. I'm aiming to have something that addresses the latest feedback ready for the July commitfest. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart
On Sun, Apr 02, 2023 at 04:37:38PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> It's been a little while since I dug into this, but I do see your point >> that the wraparound risk could be higher in some cases. For example, if >> you have a billion temp files to clean up, the custodian

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart
On Sun, Apr 02, 2023 at 04:23:05PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> On Sun, Apr 02, 2023 at 01:40:05PM -0400, Tom Lane wrote: >>> * Why does LookupCustodianFunctions think it needs to search the >>> constant array? > >> The order of the tasks in the array isn't guaranteed to

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Tom Lane
Nathan Bossart writes: > It's been a little while since I dug into this, but I do see your point > that the wraparound risk could be higher in some cases. For example, if > you have a billion temp files to clean up, the custodian could be stuck on > that task for a long time. I will give this

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Tom Lane
Nathan Bossart writes: > On Sun, Apr 02, 2023 at 01:40:05PM -0400, Tom Lane wrote: >> * Why does LookupCustodianFunctions think it needs to search the >> constant array? > The order of the tasks in the array isn't guaranteed to match the order in > the CustodianTask enum. Why not? It's a

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart
On Sun, Apr 02, 2023 at 11:42:26AM -0700, Andres Freund wrote: > Just want to note that I've repeatedly objected to 0002 and 0003, i.e. moving > serialized logical decoding snapshots and mapping files, to custodian, and > still do. Without further work it increases wraparound risks (the filenames

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Nathan Bossart
On Sun, Apr 02, 2023 at 01:40:05PM -0400, Tom Lane wrote: > I took a brief look through v20, and generally liked what I saw, > but there are a few things troubling me: Thanks for taking a look. > * The comments for CustodianEnqueueTask claim that it won't enqueue an > already-queued task, but I

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Andres Freund
Hi, On 2023-04-02 13:40:05 -0400, Tom Lane wrote: > Nathan Bossart writes: > > another rebase for cfbot > > I took a brief look through v20, and generally liked what I saw, > but there are a few things troubling me: Just want to note that I've repeatedly objected to 0002 and 0003, i.e. moving

Re: O(n) tasks cause lengthy startups and checkpoints

2023-04-02 Thread Tom Lane
Nathan Bossart writes: > another rebase for cfbot I took a brief look through v20, and generally liked what I saw, but there are a few things troubling me: * The comments for CustodianEnqueueTask claim that it won't enqueue an already-queued task, but I don't think I believe that, because it

Re: O(n) tasks cause lengthy startups and checkpoints

2023-02-17 Thread Nathan Bossart
On Thu, Feb 02, 2023 at 09:48:08PM -0800, Nathan Bossart wrote: > rebased for cfbot another rebase for cfbot -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 1c9b95cae7adcc57b7544a44ff16a26e71c6c736 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan 2022 19:24:22

Re: O(n) tasks cause lengthy startups and checkpoints

2023-02-02 Thread Nathan Bossart
rebased for cfbot -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From abbd26a3bcfcc828e196187e9f6abf6af64f3393 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan 2022 19:24:22 + Subject: [PATCH v19 1/4] Introduce custodian. The custodian process is a new

Re: O(n) tasks cause lengthy startups and checkpoints

2022-12-05 Thread Bharath Rupireddy
On Sat, Dec 3, 2022 at 12:45 AM Nathan Bossart wrote: > > On Fri, Dec 02, 2022 at 12:11:35PM +0530, Bharath Rupireddy wrote: > > On Fri, Dec 2, 2022 at 3:10 AM Nathan Bossart > > wrote: > >> The test appears to reliably create snapshot and mapping files, so if the > >> directories are empty at

Re: O(n) tasks cause lengthy startups and checkpoints

2022-12-02 Thread Nathan Bossart
On Fri, Dec 02, 2022 at 12:11:35PM +0530, Bharath Rupireddy wrote: > On Fri, Dec 2, 2022 at 3:10 AM Nathan Bossart > wrote: >> The test appears to reliably create snapshot and mapping files, so if the >> directories are empty at some point after the checkpoint at the end, we can >> be reasonably

Re: O(n) tasks cause lengthy startups and checkpoints

2022-12-01 Thread Bharath Rupireddy
On Fri, Dec 2, 2022 at 3:10 AM Nathan Bossart wrote: > > >> 4. Is it a good idea to add log messages in the DoCustodianTasks() > >> loop? Maybe at a debug level? The log message can say the current task > >> the custodian is processing. And/Or setting the custodian's status on > >> the ps display

Re: O(n) tasks cause lengthy startups and checkpoints

2022-12-01 Thread Nathan Bossart
On Wed, Nov 30, 2022 at 05:27:10PM +0530, Bharath Rupireddy wrote: > On Wed, Nov 30, 2022 at 4:52 PM Bharath Rupireddy > wrote: >> Thanks for the patches. I spent some time on reviewing v17 patch set >> and here are my comments: Thanks for reviewing! >> 0001: >> 1. I think the custodian process

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-30 Thread Bharath Rupireddy
On Wed, Nov 30, 2022 at 4:52 PM Bharath Rupireddy wrote: > > On Wed, Nov 30, 2022 at 10:48 AM Nathan Bossart > wrote: > > > > > > cfbot is not happy with v16. AFAICT this is just due to poor placement, so > > here is another attempt with the tests moved to a new location. Apologies > > for the

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-30 Thread Bharath Rupireddy
On Wed, Nov 30, 2022 at 10:48 AM Nathan Bossart wrote: > > > cfbot is not happy with v16. AFAICT this is just due to poor placement, so > here is another attempt with the tests moved to a new location. Apologies > for the noise. Thanks for the patches. I spent some time on reviewing v17 patch

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Simon Riggs
On Wed, 30 Nov 2022 at 03:56, Nathan Bossart wrote: > > On Tue, Nov 29, 2022 at 12:02:44PM +, Simon Riggs wrote: > > The last important point for me is tests, in src/test/modules > > probably. It might be possible to reuse the final state of other > > modules' tests to test cleanup, or at

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Nathan Bossart
On Tue, Nov 29, 2022 at 07:56:53PM -0800, Nathan Bossart wrote: > On Tue, Nov 29, 2022 at 12:02:44PM +, Simon Riggs wrote: >> The last important point for me is tests, in src/test/modules >> probably. It might be possible to reuse the final state of other >> modules' tests to test cleanup, or

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Nathan Bossart
On Tue, Nov 29, 2022 at 12:02:44PM +, Simon Riggs wrote: > The last important point for me is tests, in src/test/modules > probably. It might be possible to reuse the final state of other > modules' tests to test cleanup, or at least integrate a custodian test > into each module. Of course.

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Simon Riggs
On Mon, 28 Nov 2022 at 23:40, Nathan Bossart wrote: > > Okay, here is a new patch set. 0004 adds logic to prevent custodian tasks > from delaying shutdown. That all seems good, thanks. The last important point for me is tests, in src/test/modules probably. It might be possible to reuse the

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Nathan Bossart
Okay, here is a new patch set. 0004 adds logic to prevent custodian tasks from delaying shutdown. I haven't added any logging for long-running tasks yet. Tasks might ordinarily take a while, so such logs wouldn't necessarily indicate something is wrong. Perhaps we could add a GUC for the

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Robert Haas
On Mon, Nov 28, 2022 at 1:31 PM Andres Freund wrote: > On 2022-11-28 13:08:57 +, Simon Riggs wrote: > > On Sun, 27 Nov 2022 at 23:34, Nathan Bossart > > wrote: > > > > Rather than explicitly use DEBUG1 everywhere I would have an > > > > #define CUSTODIAN_LOG_LEVEL LOG > > > > so we can

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Andres Freund
On 2022-11-28 13:08:57 +, Simon Riggs wrote: > On Sun, 27 Nov 2022 at 23:34, Nathan Bossart wrote: > > > Rather than explicitly use DEBUG1 everywhere I would have an > > > #define CUSTODIAN_LOG_LEVEL LOG > > > so we can run with it in LOG mode and then set it to DEBUG1 with a one > > >

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Simon Riggs
On Sun, 27 Nov 2022 at 23:34, Nathan Bossart wrote: > > Thanks for taking a look! > > On Thu, Nov 24, 2022 at 05:31:02PM +, Simon Riggs wrote: > > * not sure I believe that everything it does can always be aborted out > > of and shutdown - to achieve that you will need a > >

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-27 Thread Nathan Bossart
Thanks for taking a look! On Thu, Nov 24, 2022 at 05:31:02PM +, Simon Riggs wrote: > * not sure I believe that everything it does can always be aborted out > of and shutdown - to achieve that you will need a > CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least I did

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-24 Thread Simon Riggs
On Thu, 24 Nov 2022 at 00:19, Nathan Bossart wrote: > > On Sun, Nov 06, 2022 at 02:38:42PM -0800, Nathan Bossart wrote: > > rebased > > another rebase for cfbot 0001 seems good to me * I like that it sleeps forever until requested * not sure I believe that everything it does can always be

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-23 Thread Nathan Bossart
On Sun, Nov 06, 2022 at 02:38:42PM -0800, Nathan Bossart wrote: > rebased another rebase for cfbot -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From b2c36a6d0d8ca5cde374b1c8b34aafaabbd7f6c2 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan 2022 19:24:22 +

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-06 Thread Nathan Bossart
On Fri, Sep 23, 2022 at 10:41:54AM -0700, Nathan Bossart wrote: > v11 adds support for building with meson. rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 367c5f3863457cfbd0fe8add0e8df3e630aaaea9 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan 2022

Re: O(n) tasks cause lengthy startups and checkpoints

2022-09-23 Thread Nathan Bossart
On Fri, Sep 02, 2022 at 03:07:44PM -0700, Nathan Bossart wrote: > And another. v11 adds support for building with meson. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 56c9ff2bf1a6524518b62193c0da02372f9674a1 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan

Re: O(n) tasks cause lengthy startups and checkpoints

2022-09-02 Thread Nathan Bossart
On Wed, Aug 24, 2022 at 09:46:24AM -0700, Nathan Bossart wrote: > Another rebase for cfbot. And another. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 63a470be1ac8af3b12684f136f70b2d7b6f87b81 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan 2022 19:24:22

Re: O(n) tasks cause lengthy startups and checkpoints

2022-08-24 Thread Nathan Bossart
On Thu, Aug 11, 2022 at 04:09:21PM -0700, Nathan Bossart wrote: > Here is a rebased patch set for cfbot. There are no other differences > between v7 and v8. Another rebase for cfbot. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 6810355cb3d1a03326b152aebe3c907f7544be4f

Re: O(n) tasks cause lengthy startups and checkpoints

2022-08-11 Thread Nathan Bossart
On Wed, Jul 06, 2022 at 09:51:10AM -0700, Nathan Bossart wrote: > Here's a new revision where I've attempted to address all the feedback I've > received thus far. Notably, the custodian now uses a queue for registering > tasks and determining which tasks to execute. Other changes include >

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-06 Thread Nathan Bossart
Here's a new revision where I've attempted to address all the feedback I've received thus far. Notably, the custodian now uses a queue for registering tasks and determining which tasks to execute. Other changes include splitting the temporary file functions apart to avoid consecutive boolean

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-03 Thread Andres Freund
Hi, On 2022-07-03 10:07:54 -0700, Nathan Bossart wrote: > Thanks for the prompt review. > > On Sat, Jul 02, 2022 at 03:54:56PM -0700, Andres Freund wrote: > > On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote: > >> + /* Obtain requested tasks */ > >> +

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-03 Thread Nathan Bossart
Hi Andres, Thanks for the prompt review. On Sat, Jul 02, 2022 at 03:54:56PM -0700, Andres Freund wrote: > On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote: >> +/* Obtain requested tasks */ >> +SpinLockAcquire(>cust_lck); >> +flags =

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-02 Thread Andres Freund
Hi, On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote: > + /* Obtain requested tasks */ > + SpinLockAcquire(>cust_lck); > + flags = CustodianShmem->cust_flags; > + CustodianShmem->cust_flags = 0; > + SpinLockRelease(>cust_lck); Just

Re: O(n) tasks cause lengthy startups and checkpoints

2022-07-02 Thread Nathan Bossart
On Fri, Jun 24, 2022 at 11:45:22AM +0100, Simon Riggs wrote: > On Thu, 23 Jun 2022 at 18:15, Nathan Bossart wrote: >> I'm grateful for the discussion in this thread so far, but I'm not seeing a >> clear path forward. > > +1 to add the new auxiliary process. I went ahead and put together a new

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-24 Thread Simon Riggs
On Thu, 23 Jun 2022 at 18:15, Nathan Bossart wrote: > I'm grateful for the discussion in this thread so far, but I'm not seeing a > clear path forward. +1 to add the new auxiliary process. -- Simon Riggshttp://www.EnterpriseDB.com/

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Nathan Bossart
On Thu, Jun 23, 2022 at 09:46:28AM -0400, Robert Haas wrote: > I do agree that a general mechanism for getting cleanup tasks done in > the background could be a useful thing to have, but I feel like it's > hard to see exactly how to make it work well. We can't just allow it > to spin up a million

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Simon Riggs
On Thu, 23 Jun 2022 at 14:46, Robert Haas wrote: > > On Thu, Jun 23, 2022 at 7:58 AM Simon Riggs > wrote: > > Having a central cleanup process makes a lot of sense. There is a long > > list of potential tasks for such a process. My understanding is that > > autovacuum already has an interface

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Robert Haas
On Thu, Jun 23, 2022 at 7:58 AM Simon Riggs wrote: > Having a central cleanup process makes a lot of sense. There is a long > list of potential tasks for such a process. My understanding is that > autovacuum already has an interface for handling additional workload > types, which is how BRIN

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Simon Riggs
On Fri, 18 Feb 2022 at 20:51, Nathan Bossart wrote: > > > On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote: > >>> > The improvements around deleting temporary files and serialized > >>> > snapshots > >>> > afaict don't require a dedicated process - they're only relevant during > >>>

Re: O(n) tasks cause lengthy startups and checkpoints

2022-03-17 Thread Nathan Bossart
It seems unlikely that anything discussed in this thread will be committed for v15, so I've adjusted the commitfest entry to v16 and moved it to the next commitfest. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-18 Thread Nathan Bossart
> On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote: >>> > The improvements around deleting temporary files and serialized snapshots >>> > afaict don't require a dedicated process - they're only relevant during >>> > startup. We could use the approach of renaming the directory out of

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-18 Thread Nathan Bossart
On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote: >> > The improvements around deleting temporary files and serialized snapshots >> > afaict don't require a dedicated process - they're only relevant during >> > startup. We could use the approach of renaming the directory out of the

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Andres Freund
Hi, On 2022-02-17 14:58:38 -0800, Nathan Bossart wrote: > On Thu, Feb 17, 2022 at 02:28:29PM -0800, Andres Freund wrote: > > As far as I understand, the primary concern are logical decoding serialized > > snapshots, because a lot of them can accumulate if there e.g. is an old > > unused > > /

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Nathan Bossart
On Thu, Feb 17, 2022 at 02:28:29PM -0800, Andres Freund wrote: > As far as I understand, the primary concern are logical decoding serialized > snapshots, because a lot of them can accumulate if there e.g. is an old unused > / far behind slot. It should be easy to reduce the number of those

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Andres Freund
Hi, On 2022-02-17 13:00:22 -0800, Nathan Bossart wrote: > Okay. So IIUC the problem might already exist today, but offloading these > tasks to a separate process could make it more likely. Vastly more, yes. Before checkpoints not happening would be a (but not a great) form of backpressure. You

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Nathan Bossart
On Thu, Feb 17, 2022 at 11:27:09AM -0800, Andres Freund wrote: > On 2022-02-17 10:23:37 -0800, Nathan Bossart wrote: >> On Wed, Feb 16, 2022 at 10:59:38PM -0800, Andres Freund wrote: >> > They're accessed by xid. The LSN is just for cleanup. Accessing files >> > left over from a previous

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Andres Freund
Hi, On 2022-02-17 10:23:37 -0800, Nathan Bossart wrote: > On Wed, Feb 16, 2022 at 10:59:38PM -0800, Andres Freund wrote: > > They're accessed by xid. The LSN is just for cleanup. Accessing files > > left over from a previous transaction with the same xid wouldn't be > > good - we'd read wrong

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-17 Thread Nathan Bossart
On Wed, Feb 16, 2022 at 10:59:38PM -0800, Andres Freund wrote: > On 2022-02-16 20:14:04 -0800, Nathan Bossart wrote: >> >> - while ((spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != NULL) >> >> + while (!ShutdownRequestPending && >> >> +(spc_de = ReadDirExtended(spc_dir,

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-16 Thread Andres Freund
Hi, On 2022-02-16 20:14:04 -0800, Nathan Bossart wrote: > >> - while ((spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != NULL) > >> + while (!ShutdownRequestPending && > >> + (spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != > >> NULL) > > > > Uh, huh? It strikes me as

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-16 Thread Nathan Bossart
Hi Andres, I appreciate the feedback. On Wed, Feb 16, 2022 at 05:50:52PM -0800, Andres Freund wrote: >> +/* Since not using PG_TRY, must reset error stack by hand */ >> +if (sigsetjmp(local_sigjmp_buf, 1) != 0) >> +{ > > I also think it's a bad idea to introduce even more

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-16 Thread Andres Freund
Hi, On 2022-02-16 16:50:57 -0800, Nathan Bossart wrote: > + * The custodian process is new as of Postgres 15. I think this kind of comment tends to age badly and not be very useful. > It's main purpose is to > + * offload tasks that could otherwise delay startup and checkpointing, but > + * it

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-16 Thread Nathan Bossart
Here is another rebased patch set. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From c11a6893d2d509df1389a1c03081b6cc563d5683 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan 2022 19:24:22 + Subject: [PATCH v5 1/8] Introduce custodian. The custodian process

Re: O(n) tasks cause lengthy startups and checkpoints

2022-02-11 Thread Nathan Bossart
Here is a rebased patch set. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 506aa95dd77f16dc64d7fe9c52ca67dd3c10212e Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan 2022 19:24:22 + Subject: [PATCH v4 1/8] Introduce custodian. The custodian process is a

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-18 Thread Bossart, Nathan
On 1/14/22, 11:26 PM, "Bharath Rupireddy" wrote: > On Sat, Jan 15, 2022 at 12:46 AM Bossart, Nathan wrote: >> I'd personally like to avoid creating two code paths for the same >> thing. Are there really cases when this one extra auxiliary process >> would be too many? And if so, how would a

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-14 Thread Bharath Rupireddy
On Sat, Jan 15, 2022 at 12:46 AM Bossart, Nathan wrote: > > On 1/14/22, 3:43 AM, "Maxim Orlov" wrote: > > The code seems to be in good condition. All the tests are running ok with > > no errors. > > Thanks for your review. > > > I like the whole idea of shifting additional checkpointer jobs as

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-14 Thread Bossart, Nathan
On 1/14/22, 3:43 AM, "Maxim Orlov" wrote: > The code seems to be in good condition. All the tests are running ok with no > errors. Thanks for your review. > I like the whole idea of shifting additional checkpointer jobs as much as > possible to another worker. In my view, it is more

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-14 Thread Maxim Orlov
The code seems to be in good condition. All the tests are running ok with no errors. I like the whole idea of shifting additional checkpointer jobs as much as possible to another worker. In my view, it is more appropriate to call this worker "bg cleaner" or "bg file cleaner" or smth. It could be

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-02 Thread Amul Sul
On Mon, Jan 3, 2022 at 2:56 AM Andres Freund wrote: > > Hi, > > On 2021-12-14 20:23:57 +, Bossart, Nathan wrote: > > As promised, here is v2. This patch set includes handling for all > > four tasks noted upthread. I'd still consider this a work-in- > > progress, as I've done minimal

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-02 Thread Andres Freund
Hi, On 2021-12-14 20:23:57 +, Bossart, Nathan wrote: > As promised, here is v2. This patch set includes handling for all > four tasks noted upthread. I'd still consider this a work-in- > progress, as I've done minimal testing. At the very least, it should > demonstrate what an auxiliary

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-14 Thread Bossart, Nathan
On 12/14/21, 12:09 PM, "Bossart, Nathan" wrote: > On 12/14/21, 9:00 AM, "Bruce Momjian" wrote: >> Have we changed temporary file handling in any recent major releases, >> meaning is this a current problem or one already improved in PG 14. > > I haven't noticed any recent improvements while

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-14 Thread Bossart, Nathan
On 12/14/21, 9:00 AM, "Bruce Momjian" wrote: > Have we changed temporary file handling in any recent major releases, > meaning is this a current problem or one already improved in PG 14. I haven't noticed any recent improvements while working in this area. Nathan

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-14 Thread Bruce Momjian
On Mon, Dec 13, 2021 at 11:05:46PM +, Bossart, Nathan wrote: > On 12/13/21, 12:37 PM, "Robert Haas" wrote: > > On Mon, Dec 13, 2021 at 1:21 PM Bossart, Nathan wrote: > >> > But against all that, if these tasks are slowing down checkpoints and > >> > that's avoidable, that seems pretty

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Bossart, Nathan
On 12/13/21, 12:37 PM, "Robert Haas" wrote: > On Mon, Dec 13, 2021 at 1:21 PM Bossart, Nathan wrote: >> > But against all that, if these tasks are slowing down checkpoints and >> > that's avoidable, that seems pretty important too. Interestingly, I >> > can't say that I've ever seen any of these

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Robert Haas
On Mon, Dec 13, 2021 at 1:21 PM Bossart, Nathan wrote: > > But against all that, if these tasks are slowing down checkpoints and > > that's avoidable, that seems pretty important too. Interestingly, I > > can't say that I've ever seen any of these things be a problem for > > checkpoint or startup

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Bossart, Nathan
On 12/13/21, 9:20 AM, "Justin Pryzby" wrote: > On Mon, Dec 13, 2021 at 08:53:37AM -0500, Robert Haas wrote: >> Another issue is that we don't want to increase the number of >> processes without bound. Processes use memory and CPU resources and if >> we run too many of them it becomes a burden on

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Bossart, Nathan
On 12/13/21, 5:54 AM, "Robert Haas" wrote: > I don't know whether this kind of idea is good or not. Thanks for chiming in. I have an almost-complete patch set that I'm hoping to post to the lists in the next couple of days. > One thing we've seen a number of times now is that entrusting the

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Justin Pryzby
On Mon, Dec 13, 2021 at 08:53:37AM -0500, Robert Haas wrote: > On Fri, Dec 10, 2021 at 2:03 PM Bossart, Nathan wrote: > > Well, I haven't had a chance to look at your patch, and my patch set > > still only has handling for CheckPointSnapBuild() and > > RemovePgTempFiles(), but I thought I'd share

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Robert Haas
On Fri, Dec 10, 2021 at 2:03 PM Bossart, Nathan wrote: > Well, I haven't had a chance to look at your patch, and my patch set > still only has handling for CheckPointSnapBuild() and > RemovePgTempFiles(), but I thought I'd share what I have anyway. I > split it into 5 patches: > > 0001 - Adds a

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-06 Thread Bossart, Nathan
On 12/6/21, 3:44 AM, "Bharath Rupireddy" wrote: > On Fri, Dec 3, 2021 at 11:50 PM Bossart, Nathan wrote: >> I might hack something together for the separate worker approach, if >> for no other reason than to make sure I really understand how these >> functions work. If/when a better idea

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-06 Thread Bharath Rupireddy
On Fri, Dec 3, 2021 at 11:50 PM Bossart, Nathan wrote: > > On 12/3/21, 5:57 AM, "Bharath Rupireddy" > wrote: > > On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan wrote: > >> > >> On 12/1/21, 6:48 PM, "Bharath Rupireddy" > >> wrote: > >> > +1 for the overall idea of making the checkpoint

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-03 Thread Bossart, Nathan
On 12/3/21, 5:57 AM, "Bharath Rupireddy" wrote: > On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan wrote: >> >> On 12/1/21, 6:48 PM, "Bharath Rupireddy" >> wrote: >> > +1 for the overall idea of making the checkpoint faster. In fact, we >> > here at our team have been thinking about this

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-03 Thread Bharath Rupireddy
On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan wrote: > > On 12/1/21, 6:48 PM, "Bharath Rupireddy" > wrote: > > +1 for the overall idea of making the checkpoint faster. In fact, we > > here at our team have been thinking about this problem for a while. If > > there are a lot of files that

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-02 Thread Bossart, Nathan
On 12/1/21, 6:48 PM, "Bharath Rupireddy" wrote: > +1 for the overall idea of making the checkpoint faster. In fact, we > here at our team have been thinking about this problem for a while. If > there are a lot of files that checkpoint has to loop over and remove, > IMO, that task can be

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-02 Thread Bossart, Nathan
On 12/1/21, 6:06 PM, "Euler Taveira" wrote: > Saying that a certain task is O(n) doesn't mean it needs a separate process to > handle it. Did you have a use case or even better numbers (% of checkpoint / > startup time) that makes your proposal worthwhile? I don't have specific numbers on hand,

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread Bharath Rupireddy
On Thu, Dec 2, 2021 at 1:54 AM Bossart, Nathan wrote: > > Hi hackers, > > Thanks to 61752af, SyncDataDirectory() can make use of syncfs() to > avoid individually syncing all database files after a crash. However, > as noted earlier this year [0], there are still a number of O(n) tasks > that

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread Euler Taveira
On Wed, Dec 1, 2021, at 9:19 PM, Bossart, Nathan wrote: > On 12/1/21, 2:56 PM, "Andres Freund" wrote: > > On 2021-12-01 20:24:25 +, Bossart, Nathan wrote: > >> I realize adding a new maintenance worker might be a bit heavy-handed, > >> but I think it would be nice to have somewhere to offload

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread Bossart, Nathan
On 12/1/21, 2:56 PM, "Andres Freund" wrote: > On 2021-12-01 20:24:25 +, Bossart, Nathan wrote: >> I realize adding a new maintenance worker might be a bit heavy-handed, >> but I think it would be nice to have somewhere to offload tasks that >> really shouldn't impact startup and

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread Andres Freund
Hi, On 2021-12-01 20:24:25 +, Bossart, Nathan wrote: > I realize adding a new maintenance worker might be a bit heavy-handed, > but I think it would be nice to have somewhere to offload tasks that > really shouldn't impact startup and checkpointing. I imagine such a > process would come in

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread SATYANARAYANA NARLAPURAM
+1 to the idea. I don't see a reason why checkpointer has to do all of that. Keeping checkpoint to minimal essential work helps servers recover faster in the event of a crash. RemoveOldXlogFiles is also an O(N) operation that can at least be avoided during the end of recovery

O(n) tasks cause lengthy startups and checkpoints

2021-12-01 Thread Bossart, Nathan
Hi hackers, Thanks to 61752af, SyncDataDirectory() can make use of syncfs() to avoid individually syncing all database files after a crash. However, as noted earlier this year [0], there are still a number of O(n) tasks that affect startup and checkpointing that I'd like to improve. Below, I've