On Mon, Nov 9, 2015 at 11:19 AM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> I thought docValues were per segment, so the price of un-inversion was
> effectively paid on each commit for all the segments, as opposed to
> just the updated one.

Both the field cache (i.e. uninverting indexed values) and docValues
are mostly per-segment (I say mostly because some uses still require
building a global ord map).

But even when things are mostly per-segment, you hit major segment
merges and the cost of un-inversion (when you aren't using docValues)
is non-trivial.

> I admit I also find the story around docValues to be very confusing at
> the moment. Especially on the interplay with "indexed=false".

You still need "indexed=true" for efficient filters on the field.
Hence if you're faceting on a field and want to use docValues, you
probably want to keep the "indexed=true" on the field as well.

-Yonik


> It would
> make a VERY good article to have this clarified somehow by people in
> the know.
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 9 November 2015 at 11:04, Yonik Seeley <ysee...@gmail.com> wrote:
>> On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz <demian.k...@villanova.edu> 
>> wrote:
>>> I understand that by adding "docValues=true" to some of my fields, I can 
>>> improve sorting/faceting performance.
>>
>> I don't think this is true in the general sense.
>> docValues are built at index-time, so what you will save is initial
>> un-inversion time (i.e. the first time a field is used after a new
>> searcher is opened).
>> After that point, docValues may be slightly slower.
>>
>> The other advantage of docValues is memory use... much/most of it is
>> essentially "off-heap", being memory-mapped from disk.  This cuts down
>> on memory issues and helps reduce longer GC pauses.
>>
>> docValues are good in general, and I think we should default to them
>> more for Solr 6, but they are not better in all ways.
>>
>>> However, I have a couple of questions:
>>>
>>>
>>> 1.)    Will Solr always take proper advantage of docValues when it is 
>>> turned on
>>
>> Yes.
>>
>>> , or will I gain greater performance by turning of stored/indexed in 
>>> situations where only docValues are necessary (e.g. a sort-only field)?
>>>
>>> 2.)    Will adding docValues to a field introduce significant performance 
>>> penalties for non-docValues uses of that field, beyond the obvious fact 
>>> that the additional data will consume more disk and memory?
>>
>> No, it's a separate part of the index.
>>
>> -Yonik
>>
>>
>>> I'm asking this question because the existing schema has some multi-purpose 
>>> fields, and I'm trying to determine whether I should just add 
>>> "docValues=true" wherever it might help, or if I need to take a more 
>>> thoughtful approach and potentially split some fields with copyFields, etc. 
>>> This is particularly significant because my schema makes use of some 
>>> dynamic field suffixes, and I'm not sure if I need to add new suffixes to 
>>> differentiate docValues/non-docValues fields, or if it's okay to turn on 
>>> docValues across the board "just in case."
>>>
>>> Apologies if these questions have already been answered - I couldn't find a 
>>> totally clear answer in the places I searched.
>>>
>>> Thanks!
>>>
>>> - Demian

Reply via email to