@Erick thanks:) i´m with you with your opinion. my load tests show the same.
@Dmitry my docs are small too, i think about 3-15KB per doc. i update my index all the time and i have an average of 20-50 requests per minute (20% facet queries, 80% large boolean queries with wildcard/fuzzy) . How much docs at a time=> depends from choosed filters, from 10 to all 100Mio. I work with very small caches (strangely, but if my index is under 100GB i need larger caches, over 100GB smaller caches..) My JVM has 6GB, 18GB for I/O. With few updates a day i would configure very big caches, like Tim Burton (see HathiTrust´s Blog) Regards Vadim 2012/1/24 Anderson vasconcelos <anderson.v...@gmail.com>: > Thanks for the explanation Erick :) > > 2012/1/24, Erick Erickson <erickerick...@gmail.com>: >> Talking about "index size" can be very misleading. Take >> a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names. >> Note that the *.fdt and *.fdx files are used to for stored fields, i.e. >> the verbatim copy of data put in the index when you specify >> stored="true". These files have virtually no impact on search >> speed. >> >> So, if your *.fdx and *.fdt files are 90G out of a 100G index >> it is a much different thing than if these files are 10G out of >> a 100G index. >> >> And this doesn't even mention the peculiarities of your query mix. >> Nor does it say a thing about whether your cheapest alternative >> is to add more memory. >> >> Anderson's method is about the only reliable one, you just have >> to test with your index and real queries. At some point, you'll >> find your tipping point, typically when you come under memory >> pressure. And it's a balancing act between how much memory >> you allocate to the JVM and how much you leave for the op >> system. >> >> Bottom line: No hard and fast numbers. And you should periodically >> re-test the empirical numbers you *do* arrive at... >> >> Best >> Erick >> >> On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos >> <anderson.v...@gmail.com> wrote: >>> Apparently, not so easy to determine when to break the content into >>> pieces. I'll investigate further about the amount of documents, the >>> size of each document and what kind of search is being used. It seems, >>> I will have to do a load test to identify the cutoff point to begin >>> using the strategy of shards. >>> >>> Thanks >>> >>> 2012/1/24, Dmitry Kan <dmitry....@gmail.com>: >>>> Hi, >>>> >>>> The article you gave mentions 13GB of index size. It is quite small index >>>> from our perspective. We have noticed, that at least solr 3.4 has some >>>> sort >>>> of "choking" point with respect to growing index size. It just becomes >>>> substantially slower than what we need (a query on avg taking more than >>>> 3-4 >>>> seconds) once index size crosses a magic level (about 80GB following our >>>> practical observations). We try to keep our indices at around 60-70GB for >>>> fast searches and above 100GB for slow ones. We also route majority of >>>> user >>>> queries to fast indices. Yes, caching may help, but not necessarily we >>>> can >>>> afford adding more RAM for bigger indices. BTW, our documents are very >>>> small, thus in 100GB index we can have around 200 mil. documents. It >>>> would >>>> be interesting to see, how you manage to ensure q-times under 1 sec with >>>> an >>>> index of 250GB? How many documents / facets do you ask max. at a time? >>>> FYI, >>>> we ask for a thousand of facets in one go. >>>> >>>> Regards, >>>> Dmitry >>>> >>>> On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann < >>>> v.kisselm...@googlemail.com> wrote: >>>> >>>>> Hi, >>>>> it depends from your hardware. >>>>> Read this: >>>>> >>>>> http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ >>>>> Think about your cache-config (few updates, big caches) and a good >>>>> HW-infrastructure. >>>>> In my case i can handle a 250GB index with 100mil. docs on a I7 >>>>> machine with RAID10 and 24GB RAM => q-times under 1 sec. >>>>> Regards >>>>> Vadim >>>>> >>>>> >>>>> >>>>> 2012/1/24 Anderson vasconcelos <anderson.v...@gmail.com>: >>>>> > Hi >>>>> > Has some size of index (or number of docs) that is necessary to break >>>>> > the index in shards? >>>>> > I have a index with 100GB of size. This index increase 10GB per year. >>>>> > (I don't have information how many docs they have) and the docs never >>>>> > will be deleted. Thinking in 30 years, the index will be with 400GB >>>>> > of size. >>>>> > >>>>> > I think is not required to break in shard, because i not consider >>>>> > this like a "large index". Am I correct? What's is a real "large >>>>> > index" >>>>> > >>>>> > >>>>> > Thanks >>>>> >>>> >>