My recommendation in these scenarios is to prototype.

   - Fine-tune your schema.xml to correctly map all the field types to the
   minimum sized data typed in accordance with your requirements.
   - assign an initial memory to the java process (you can start as low as
   you like, even a few GB)
   - start the indexing process and monitor the memory usage (in this way
   you can realize how much memory you really need for indexing)
   - adjust the memory allocation to a bit bigger than what you needed,
   don't exceed or you may end up in nasty garbage collection scenarios
   - check the indexes size, you want to have free Operative System ram
   memory available up to allocate the entire index in memory
   - it that's too expensive for the sizes you need, buying an SSD of that
   size is pretty much a good approach
   - make sure you leave enough disk space for the index to temporarily
   triplicate in size
   - benchmark your queries (including faceting, sorting, other stats,
   reranking ect) to check if the memory allocated is enough
   - iterate

This is a short guideline, the process can become much more complex as
there's plenty to tune and adjust, but this list can be a starting point.

Cheers

--------------------------
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Sun, 13 Jun 2021 at 23:48, Dave <[email protected]> wrote:

> had very good luck with as much memory and as large of an ssd you can buy,
> and setting the jvm xmx and Xms to exactly 31gb and letting the Linux
> server do it’s own caching for the rest. 31 is a very specific number
>
> > On Jun 13, 2021, at 5:28 PM, Syed Hasan <[email protected]> wrote:
> >
> > Hi Guys,
> > I'm brand new to solr. I've been investigating the proper way to search
> > huge VCF file(s) in order to perform my bioinformatics analysis
> > functionality.
> >
> > I came across SOLR. Sounds very interesting and promising. Before I dive
> > into it, I would like to know how much memory and hard-disk space will be
> > required for solr.
> > The size of My VCF file(s) can be from 3 to 4 terabytes.
> >
> > Can you please guide me in this regard? Is there anything else that I
> > would need to consider before I attempt to leverage solr?
> >
> > Thanks,
> > Hasan
>

Reply via email to