Thanks for the explanation Erick :)

2012/1/24, Erick Erickson <erickerick...@gmail.com>:
> Talking about "index size" can be very misleading. Take
> a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names.
> Note that the *.fdt and *.fdx files are used to for stored fields, i.e.
> the verbatim copy of data put in the index when you specify
> stored="true". These files have virtually no impact on search
> speed.
>
> So, if your *.fdx and *.fdt files are 90G out of a 100G index
> it is a much different thing than if these files are 10G out of
> a 100G index.
>
> And this doesn't even mention the peculiarities of your query mix.
> Nor does it say a thing about whether your cheapest alternative
> is to add more memory.
>
> Anderson's method is about the only reliable one, you just have
> to test with your index and real queries. At some point, you'll
> find your tipping point, typically when you come under memory
> pressure. And it's a balancing act between how much memory
> you allocate to the JVM and how much you leave for the op
> system.
>
> Bottom line: No hard and fast numbers. And you should periodically
> re-test the empirical numbers you *do* arrive at...
>
> Best
> Erick
>
> On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos
> <anderson.v...@gmail.com> wrote:
>> Apparently, not so easy to determine when to break the content into
>> pieces. I'll investigate further about the amount of documents, the
>> size of each document and what kind of search is being used. It seems,
>> I will have to do a load test to identify the cutoff point to begin
>> using the strategy of shards.
>>
>> Thanks
>>
>> 2012/1/24, Dmitry Kan <dmitry....@gmail.com>:
>>> Hi,
>>>
>>> The article you gave mentions 13GB of index size. It is quite small index
>>> from our perspective. We have noticed, that at least solr 3.4 has some
>>> sort
>>> of "choking" point with respect to growing index size. It just becomes
>>> substantially slower than what we need (a query on avg taking more than
>>> 3-4
>>> seconds) once index size crosses a magic level (about 80GB following our
>>> practical observations). We try to keep our indices at around 60-70GB for
>>> fast searches and above 100GB for slow ones. We also route majority of
>>> user
>>> queries to fast indices. Yes, caching may help, but not necessarily we
>>> can
>>> afford adding more RAM for bigger indices. BTW, our documents are very
>>> small, thus in 100GB index we can have around 200 mil. documents. It
>>> would
>>> be interesting to see, how you manage to ensure q-times under 1 sec with
>>> an
>>> index of 250GB? How many documents / facets do you ask max. at a time?
>>> FYI,
>>> we ask for a thousand of facets in one go.
>>>
>>> Regards,
>>> Dmitry
>>>
>>> On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann <
>>> v.kisselm...@googlemail.com> wrote:
>>>
>>>> Hi,
>>>> it depends from your hardware.
>>>> Read this:
>>>>
>>>> http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/
>>>> Think about your cache-config (few updates, big caches) and a good
>>>> HW-infrastructure.
>>>> In my case i can handle a 250GB index with 100mil. docs on a I7
>>>> machine with RAID10 and 24GB RAM => q-times under 1 sec.
>>>> Regards
>>>> Vadim
>>>>
>>>>
>>>>
>>>> 2012/1/24 Anderson vasconcelos <anderson.v...@gmail.com>:
>>>> > Hi
>>>> > Has some size of index (or number of docs) that is necessary to break
>>>> > the index in shards?
>>>> > I have a index with 100GB of size. This index increase 10GB per year.
>>>> > (I don't have information how many docs they have) and the docs never
>>>> > will be deleted.  Thinking in 30 years, the index will be with 400GB
>>>> > of size.
>>>> >
>>>> > I think  is not required to break in shard, because i not consider
>>>> > this like a "large index". Am I correct? What's is a real "large
>>>> > index"
>>>> >
>>>> >
>>>> > Thanks
>>>>
>>>
>

Reply via email to