Thanks for the explanation Erick :)
2012/1/24, Erick Erickson <erickerick...@gmail.com>: > Talking about "index size" can be very misleading. Take > a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names. > Note that the *.fdt and *.fdx files are used to for stored fields, i.e. > the verbatim copy of data put in the index when you specify > stored="true". These files have virtually no impact on search > speed. > > So, if your *.fdx and *.fdt files are 90G out of a 100G index > it is a much different thing than if these files are 10G out of > a 100G index. > > And this doesn't even mention the peculiarities of your query mix. > Nor does it say a thing about whether your cheapest alternative > is to add more memory. > > Anderson's method is about the only reliable one, you just have > to test with your index and real queries. At some point, you'll > find your tipping point, typically when you come under memory > pressure. And it's a balancing act between how much memory > you allocate to the JVM and how much you leave for the op > system. > > Bottom line: No hard and fast numbers. And you should periodically > re-test the empirical numbers you *do* arrive at... > > Best > Erick > > On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos > <anderson.v...@gmail.com> wrote: >> Apparently, not so easy to determine when to break the content into >> pieces. I'll investigate further about the amount of documents, the >> size of each document and what kind of search is being used. It seems, >> I will have to do a load test to identify the cutoff point to begin >> using the strategy of shards. >> >> Thanks >> >> 2012/1/24, Dmitry Kan <dmitry....@gmail.com>: >>> Hi, >>> >>> The article you gave mentions 13GB of index size. It is quite small index >>> from our perspective. We have noticed, that at least solr 3.4 has some >>> sort >>> of "choking" point with respect to growing index size. It just becomes >>> substantially slower than what we need (a query on avg taking more than >>> 3-4 >>> seconds) once index size crosses a magic level (about 80GB following our >>> practical observations). We try to keep our indices at around 60-70GB for >>> fast searches and above 100GB for slow ones. We also route majority of >>> user >>> queries to fast indices. Yes, caching may help, but not necessarily we >>> can >>> afford adding more RAM for bigger indices. BTW, our documents are very >>> small, thus in 100GB index we can have around 200 mil. documents. It >>> would >>> be interesting to see, how you manage to ensure q-times under 1 sec with >>> an >>> index of 250GB? How many documents / facets do you ask max. at a time? >>> FYI, >>> we ask for a thousand of facets in one go. >>> >>> Regards, >>> Dmitry >>> >>> On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann < >>> v.kisselm...@googlemail.com> wrote: >>> >>>> Hi, >>>> it depends from your hardware. >>>> Read this: >>>> >>>> http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ >>>> Think about your cache-config (few updates, big caches) and a good >>>> HW-infrastructure. >>>> In my case i can handle a 250GB index with 100mil. docs on a I7 >>>> machine with RAID10 and 24GB RAM => q-times under 1 sec. >>>> Regards >>>> Vadim >>>> >>>> >>>> >>>> 2012/1/24 Anderson vasconcelos <anderson.v...@gmail.com>: >>>> > Hi >>>> > Has some size of index (or number of docs) that is necessary to break >>>> > the index in shards? >>>> > I have a index with 100GB of size. This index increase 10GB per year. >>>> > (I don't have information how many docs they have) and the docs never >>>> > will be deleted. Thinking in 30 years, the index will be with 400GB >>>> > of size. >>>> > >>>> > I think is not required to break in shard, because i not consider >>>> > this like a "large index". Am I correct? What's is a real "large >>>> > index" >>>> > >>>> > >>>> > Thanks >>>> >>> >