@Erick
thanks:)
i´m with you with your opinion.
my load tests show the same.

@Dmitry
my docs are small too, i think about 3-15KB per doc.
i update my index all the time and i have an average of 20-50 requests
per minute (20% facet queries, 80% large boolean queries with
wildcard/fuzzy) . How much docs at a time=> depends from choosed
filters, from 10 to all 100Mio.
I work with very small caches (strangely, but if my index is under
100GB i need larger caches, over 100GB smaller caches..)
My JVM has 6GB, 18GB for I/O.
With few updates a day i would configure very big caches, like Tim
Burton (see HathiTrust´s Blog)

Regards Vadim



2012/1/24 Anderson vasconcelos <anderson.v...@gmail.com>:
> Thanks for the explanation Erick :)
>
> 2012/1/24, Erick Erickson <erickerick...@gmail.com>:
>> Talking about "index size" can be very misleading. Take
>> a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names.
>> Note that the *.fdt and *.fdx files are used to for stored fields, i.e.
>> the verbatim copy of data put in the index when you specify
>> stored="true". These files have virtually no impact on search
>> speed.
>>
>> So, if your *.fdx and *.fdt files are 90G out of a 100G index
>> it is a much different thing than if these files are 10G out of
>> a 100G index.
>>
>> And this doesn't even mention the peculiarities of your query mix.
>> Nor does it say a thing about whether your cheapest alternative
>> is to add more memory.
>>
>> Anderson's method is about the only reliable one, you just have
>> to test with your index and real queries. At some point, you'll
>> find your tipping point, typically when you come under memory
>> pressure. And it's a balancing act between how much memory
>> you allocate to the JVM and how much you leave for the op
>> system.
>>
>> Bottom line: No hard and fast numbers. And you should periodically
>> re-test the empirical numbers you *do* arrive at...
>>
>> Best
>> Erick
>>
>> On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos
>> <anderson.v...@gmail.com> wrote:
>>> Apparently, not so easy to determine when to break the content into
>>> pieces. I'll investigate further about the amount of documents, the
>>> size of each document and what kind of search is being used. It seems,
>>> I will have to do a load test to identify the cutoff point to begin
>>> using the strategy of shards.
>>>
>>> Thanks
>>>
>>> 2012/1/24, Dmitry Kan <dmitry....@gmail.com>:
>>>> Hi,
>>>>
>>>> The article you gave mentions 13GB of index size. It is quite small index
>>>> from our perspective. We have noticed, that at least solr 3.4 has some
>>>> sort
>>>> of "choking" point with respect to growing index size. It just becomes
>>>> substantially slower than what we need (a query on avg taking more than
>>>> 3-4
>>>> seconds) once index size crosses a magic level (about 80GB following our
>>>> practical observations). We try to keep our indices at around 60-70GB for
>>>> fast searches and above 100GB for slow ones. We also route majority of
>>>> user
>>>> queries to fast indices. Yes, caching may help, but not necessarily we
>>>> can
>>>> afford adding more RAM for bigger indices. BTW, our documents are very
>>>> small, thus in 100GB index we can have around 200 mil. documents. It
>>>> would
>>>> be interesting to see, how you manage to ensure q-times under 1 sec with
>>>> an
>>>> index of 250GB? How many documents / facets do you ask max. at a time?
>>>> FYI,
>>>> we ask for a thousand of facets in one go.
>>>>
>>>> Regards,
>>>> Dmitry
>>>>
>>>> On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann <
>>>> v.kisselm...@googlemail.com> wrote:
>>>>
>>>>> Hi,
>>>>> it depends from your hardware.
>>>>> Read this:
>>>>>
>>>>> http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/
>>>>> Think about your cache-config (few updates, big caches) and a good
>>>>> HW-infrastructure.
>>>>> In my case i can handle a 250GB index with 100mil. docs on a I7
>>>>> machine with RAID10 and 24GB RAM => q-times under 1 sec.
>>>>> Regards
>>>>> Vadim
>>>>>
>>>>>
>>>>>
>>>>> 2012/1/24 Anderson vasconcelos <anderson.v...@gmail.com>:
>>>>> > Hi
>>>>> > Has some size of index (or number of docs) that is necessary to break
>>>>> > the index in shards?
>>>>> > I have a index with 100GB of size. This index increase 10GB per year.
>>>>> > (I don't have information how many docs they have) and the docs never
>>>>> > will be deleted.  Thinking in 30 years, the index will be with 400GB
>>>>> > of size.
>>>>> >
>>>>> > I think  is not required to break in shard, because i not consider
>>>>> > this like a "large index". Am I correct? What's is a real "large
>>>>> > index"
>>>>> >
>>>>> >
>>>>> > Thanks
>>>>>
>>>>
>>

Reply via email to