Re: Limiting Backup IO

2023-07-10 Thread Bruno Roustant
Nice +1 Le ven. 7 juil. 2023 à 16:07, David Smiley a écrit : > I gave it a look. I like it! > > ~ David > > > On Thu, Jul 6, 2023 at 6:22 PM Pierre Salagnac > wrote: > > > Here is my POC to add a queue into CoreAdminHandler: > > https://github.com/apache/solr/pull/1761 > > > > It does the foll

Re: Limiting Backup IO

2023-07-07 Thread David Smiley
I gave it a look. I like it! ~ David On Thu, Jul 6, 2023 at 6:22 PM Pierre Salagnac wrote: > Here is my POC to add a queue into CoreAdminHandler: > https://github.com/apache/solr/pull/1761 > > It does the following: > - add a flag to core admin operations to be marked as expensive. For now, >

Re: Limiting Backup IO

2023-07-06 Thread Pierre Salagnac
Here is my POC to add a queue into CoreAdminHandler: https://github.com/apache/solr/pull/1761 It does the following: - add a flag to core admin operations to be marked as expensive. For now, only backup and restore are expensive, this may be extended. - in CoreAdminHandler, we count the number of

Re: Limiting Backup IO

2023-06-29 Thread Pierre Salagnac
Jason, I haven't done much scalability testing, so it's hard to give accurate numbers on when we start having issues. For the environment I looked in detail we run a 16 nodes cluster, and the collection I wasn't able to backup has about 1500 shards, ~1.5 GB each. Core backups/restores are expensiv

Re: Limiting Backup IO

2023-06-27 Thread David Smiley
Here's a POC: https://github.com/apache/solr/pull/1729 ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Jun 19, 2023 at 3:36 PM David Smiley wrote: > Has anyone mitigated the potentially large IO impact of doing a backup of > a large collection

Re: Limiting Backup IO

2023-06-27 Thread David Smiley
Here's a POC: https://github.com/apache/solr/pull/1729 ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Jun 26, 2023 at 1:53 PM Jason Gerlowski wrote: > Sounds like something that would be very useful for folks. > > I'm sure it'd be very depend

Re: Limiting Backup IO

2023-06-26 Thread Jason Gerlowski
Sounds like something that would be very useful for folks. I'm sure it'd be very dependent on your data and the type of backup, but I'm curious - if you can share Pierre - is there a number of cores-per-node being backed up where you start to see problems? Jason On Wed, Jun 21, 2023 at 8:34 AM P

Re: Limiting Backup IO

2023-06-21 Thread Pierre Salagnac
Thanks for starting this thread David. I've been internally working on this, since we have issues (query failures) during backups of big collections because of IO saturation. I see two different approaches to solve this: 1. Throttle at the IO level, like David mentioned. 2. Limit the number of co

Re: Limiting Backup IO

2023-06-20 Thread Ishan Chattopadhyaya
Might be a good question for users@ list, I guess. I'm sure other users must've thought about this. Cross posting there, as I'm curious myself too. On Tue, 20 Jun 2023 at 01:07, David Smiley wrote: > Has anyone mitigated the potentially large IO impact of doing a backup of a > large collection o

Limiting Backup IO

2023-06-19 Thread David Smiley
Has anyone mitigated the potentially large IO impact of doing a backup of a large collection or just in general? If the collection is large enough, there very well could be many shards on one host and it could saturate the IO. I wonder if there should be a rate limit mechanism or some other mecha