Re: Recommended index-size per core

2017-05-11 Thread Erick Erickson
One additional bit: The *.fdt files contain the stored values (i.e. stored=true). This a verbatim, compressed copy of the input for these fields. This data does not need to reside in any memory. Say you have rows=10, and numFound is 10,000,000. The stored data is only accessed for the 10 returned d

Re: Recommended index-size per core

2017-05-11 Thread Shawn Heisey
On 5/11/2017 4:59 PM, S G wrote: > How can 50GB index be handled by a 10GB heap? > I am a developer myself and would love to know as many details as possible. > So a long answer would be much appreciated. Lucene (which is what provides large pieces of Solr's functionality) does not read the enti

Re: Recommended index-size per core

2017-05-11 Thread S G
Thanks Toke. Your answer did help me a lot. But one part about your answer is something that has always been confusing to be me. > The JVM heap is not used for caching the index data directly (although it holds derived data). What you need is free memory on your machine for OS disk-caching. > The

Re: Recommended index-size per core

2017-05-11 Thread Shawn Heisey
On 5/10/2017 11:52 AM, S G wrote: > Is there a recommendation on the size of index that one should host > per core? No, there really isn't. I can list off a bunch of recommendations, but a whole bunch of things that I don't know about your install could make those recommendations completely wrong

Re: Recommended index-size per core

2017-05-11 Thread David Hastings
I am curious about this as well. I generally have been using about a third of available memory for the java heap, so I keep 50gb/150 available for the jvm. Think this should be reduced? On Wed, May 10, 2017 at 6:36 PM, Toke Eskildsen wrote: > S G wrote: > > *Rough estimates for an initial siz

Re: Recommended index-size per core

2017-05-10 Thread Toke Eskildsen
S G wrote: > *Rough estimates for an initial size:* > > 50gb index is best served if all of it is in memory. Assuming you need low latency and/or high throughput, yes. I mention this because in many cases the requirements for number of simultaneous users and response times are known (at least

Recommended index-size per core

2017-05-10 Thread S G
Hi, Is there a recommendation on the size of index that one should host per core? Idea is to come up with an *initial* shard/replica setting for a load test. And then arrive at a good cluster size based on that testing. *Example: * Num documents: 100 million Average document size: 1kb So total