My recommendation in these scenarios is to prototype. - Fine-tune your schema.xml to correctly map all the field types to the minimum sized data typed in accordance with your requirements. - assign an initial memory to the java process (you can start as low as you like, even a few GB) - start the indexing process and monitor the memory usage (in this way you can realize how much memory you really need for indexing) - adjust the memory allocation to a bit bigger than what you needed, don't exceed or you may end up in nasty garbage collection scenarios - check the indexes size, you want to have free Operative System ram memory available up to allocate the entire index in memory - it that's too expensive for the sizes you need, buying an SSD of that size is pretty much a good approach - make sure you leave enough disk space for the index to temporarily triplicate in size - benchmark your queries (including faceting, sorting, other stats, reranking ect) to check if the memory allocated is enough - iterate
This is a short guideline, the process can become much more complex as there's plenty to tune and adjust, but this list can be a starting point. Cheers -------------------------- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Sun, 13 Jun 2021 at 23:48, Dave <[email protected]> wrote: > had very good luck with as much memory and as large of an ssd you can buy, > and setting the jvm xmx and Xms to exactly 31gb and letting the Linux > server do it’s own caching for the rest. 31 is a very specific number > > > On Jun 13, 2021, at 5:28 PM, Syed Hasan <[email protected]> wrote: > > > > Hi Guys, > > I'm brand new to solr. I've been investigating the proper way to search > > huge VCF file(s) in order to perform my bioinformatics analysis > > functionality. > > > > I came across SOLR. Sounds very interesting and promising. Before I dive > > into it, I would like to know how much memory and hard-disk space will be > > required for solr. > > The size of My VCF file(s) can be from 3 to 4 terabytes. > > > > Can you please guide me in this regard? Is there anything else that I > > would need to consider before I attempt to leverage solr? > > > > Thanks, > > Hasan >
