Big per-file overhead on writing suggests it'd be beneficial to set
useCompoundFile to true (the default is false).
I think unlocking more write performance requires some sort of write level
cache to enable segment merges to use local segment files if they have been
written recently. It could be
Solr already supports today reading and indexing on cloud storage - ABFS,
GCS, and S3 - using the Hadoop HDFS module. I assume the same works with
HDFS backup/restore as well. I haven't checked if all the supporting
libraries are included in the shipped Solr distribution, but the HDFS
filesystem su
As far as a Lucene/Solr directory on cloud storage. Performance on the
write has a lot of overhead per file, hundreds of millis. The read overhead
is about half as much. I believe the write is so expensive due to the
strong consistency of both gcs and s3. So I think the main bottleneck would
be ind
My colleague at SearchScale has tried S3FS, and running Solr indexes off
S3. We can chat about it, if you're interested.
On Fri, 21 Apr, 2023, 10:38 am David Smiley, wrote:
> Cool!
> I wonder if anyone has tried such things for a Lucene/Solr "Directory" as
> well?
>
> ~ David Smiley
> Apache Luc
Cool!
I wonder if anyone has tried such things for a Lucene/Solr "Directory" as
well?
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
On Mon, Apr 17, 2023 at 1:14 PM Joel Bernstein wrote:
> I've been testing Java NIO providers for cloud storage. These
I've been testing Java NIO providers for cloud storage. These two in
particular worked for our use cases:
https://github.com/googleapis/java-storage-nio
https://github.com/carlspring/s3fs-nio
I believe an Azure provider is available.
We've been working on sponsoring getting the s3 provider into
Oh thanks, Jan. I had missed it. It is a shame because it looks like a very
neat project.
On Mon, 10 Apr, 2023, 23:53 Jan Høydahl, wrote:
> Looks like a nice project. With the promise of low-hanging support for
> more providers than those three for free.
>
> However, https://lists.apache.org/thr
Looks like a nice project. With the promise of low-hanging support for more
providers than those three for free.
However, https://lists.apache.org/thread/w61gzk2ohjtshbwcb5gy6wb2htv7fo0x does
not look promising - they plan to move the project to the Attic, and no new
releases has happened durin
Sounds interesting. Don't really know anything about jclouds, a quick
glance at your link didn't tell me much, but if they ship libraries that
can plug in (or otherwise be leveraged without need for any external
software) and handle connectivity that sounds like a win. Not as keen if it
requires an
Supported storage providers, FYI:
https://jclouds.apache.org/reference/providers/#blobstore-providers
On Mon, 10 Apr 2023 at 22:49, Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:
> TBH, I haven't personally used either of them extensively, but just synced
> up with my colleague who buil
TBH, I haven't personally used either of them extensively, but just synced
up with my colleague who built that solution. So, I thought of bringing it
up here for any additional points of consideration (in case JClouds wasn't
considered earlier).
I'm not invested into this effort much either way as
Hi all,
For backup/restore, we have out of the box support for GCS and S3, but not
Azure. I think we should deprecate both the modules for S3 and GCS, and
adopt Apache JCloud project that supports all three. For testing, we could
try Minio (unless we are already happy with S3Mock that we use today
12 matches
Mail list logo