Re: Cloud storage modules for backup/restore

2023-04-24 Thread David Smiley
Big per-file overhead on writing suggests it'd be beneficial to set useCompoundFile to true (the default is false). I think unlocking more write performance requires some sort of write level cache to enable segment merges to use local segment files if they have been written recently. It could be

Re: Cloud storage modules for backup/restore

2023-04-24 Thread Kevin Risden
Solr already supports today reading and indexing on cloud storage - ABFS, GCS, and S3 - using the Hadoop HDFS module. I assume the same works with HDFS backup/restore as well. I haven't checked if all the supporting libraries are included in the shipped Solr distribution, but the HDFS filesystem su

Re: Cloud storage modules for backup/restore

2023-04-24 Thread Joel Bernstein
As far as a Lucene/Solr directory on cloud storage. Performance on the write has a lot of overhead per file, hundreds of millis. The read overhead is about half as much. I believe the write is so expensive due to the strong consistency of both gcs and s3. So I think the main bottleneck would be ind

Re: Cloud storage modules for backup/restore

2023-04-21 Thread Ishan Chattopadhyaya
My colleague at SearchScale has tried S3FS, and running Solr indexes off S3. We can chat about it, if you're interested. On Fri, 21 Apr, 2023, 10:38 am David Smiley, wrote: > Cool! > I wonder if anyone has tried such things for a Lucene/Solr "Directory" as > well? > > ~ David Smiley > Apache Luc

Re: Cloud storage modules for backup/restore

2023-04-20 Thread David Smiley
Cool! I wonder if anyone has tried such things for a Lucene/Solr "Directory" as well? ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Apr 17, 2023 at 1:14 PM Joel Bernstein wrote: > I've been testing Java NIO providers for cloud storage. These

Re: Cloud storage modules for backup/restore

2023-04-17 Thread Joel Bernstein
I've been testing Java NIO providers for cloud storage. These two in particular worked for our use cases: https://github.com/googleapis/java-storage-nio https://github.com/carlspring/s3fs-nio I believe an Azure provider is available. We've been working on sponsoring getting the s3 provider into

Re: Cloud storage modules for backup/restore

2023-04-10 Thread Ishan Chattopadhyaya
Oh thanks, Jan. I had missed it. It is a shame because it looks like a very neat project. On Mon, 10 Apr, 2023, 23:53 Jan Høydahl, wrote: > Looks like a nice project. With the promise of low-hanging support for > more providers than those three for free. > > However, https://lists.apache.org/thr

Re: Cloud storage modules for backup/restore

2023-04-10 Thread Jan Høydahl
Looks like a nice project. With the promise of low-hanging support for more providers than those three for free. However, https://lists.apache.org/thread/w61gzk2ohjtshbwcb5gy6wb2htv7fo0x does not look promising - they plan to move the project to the Attic, and no new releases has happened durin

Re: Cloud storage modules for backup/restore

2023-04-10 Thread Gus Heck
Sounds interesting. Don't really know anything about jclouds, a quick glance at your link didn't tell me much, but if they ship libraries that can plug in (or otherwise be leveraged without need for any external software) and handle connectivity that sounds like a win. Not as keen if it requires an

Re: Cloud storage modules for backup/restore

2023-04-10 Thread Ishan Chattopadhyaya
Supported storage providers, FYI: https://jclouds.apache.org/reference/providers/#blobstore-providers On Mon, 10 Apr 2023 at 22:49, Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > TBH, I haven't personally used either of them extensively, but just synced > up with my colleague who buil

Re: Cloud storage modules for backup/restore

2023-04-10 Thread Ishan Chattopadhyaya
TBH, I haven't personally used either of them extensively, but just synced up with my colleague who built that solution. So, I thought of bringing it up here for any additional points of consideration (in case JClouds wasn't considered earlier). I'm not invested into this effort much either way as

Cloud storage modules for backup/restore

2023-04-10 Thread Ishan Chattopadhyaya
Hi all, For backup/restore, we have out of the box support for GCS and S3, but not Azure. I think we should deprecate both the modules for S3 and GCS, and adopt Apache JCloud project that supports all three. For testing, we could try Minio (unless we are already happy with S3Mock that we use today