On Mon, Nov 9, 2015 at 11:19 AM, Alexandre Rafalovitch <[email protected]> wrote: > I thought docValues were per segment, so the price of un-inversion was > effectively paid on each commit for all the segments, as opposed to > just the updated one.
Both the field cache (i.e. uninverting indexed values) and docValues are mostly per-segment (I say mostly because some uses still require building a global ord map). But even when things are mostly per-segment, you hit major segment merges and the cost of un-inversion (when you aren't using docValues) is non-trivial. > I admit I also find the story around docValues to be very confusing at > the moment. Especially on the interplay with "indexed=false". You still need "indexed=true" for efficient filters on the field. Hence if you're faceting on a field and want to use docValues, you probably want to keep the "indexed=true" on the field as well. -Yonik > It would > make a VERY good article to have this clarified somehow by people in > the know. > > Regards, > Alex. > ---- > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 9 November 2015 at 11:04, Yonik Seeley <[email protected]> wrote: >> On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz <[email protected]> >> wrote: >>> I understand that by adding "docValues=true" to some of my fields, I can >>> improve sorting/faceting performance. >> >> I don't think this is true in the general sense. >> docValues are built at index-time, so what you will save is initial >> un-inversion time (i.e. the first time a field is used after a new >> searcher is opened). >> After that point, docValues may be slightly slower. >> >> The other advantage of docValues is memory use... much/most of it is >> essentially "off-heap", being memory-mapped from disk. This cuts down >> on memory issues and helps reduce longer GC pauses. >> >> docValues are good in general, and I think we should default to them >> more for Solr 6, but they are not better in all ways. >> >>> However, I have a couple of questions: >>> >>> >>> 1.) Will Solr always take proper advantage of docValues when it is >>> turned on >> >> Yes. >> >>> , or will I gain greater performance by turning of stored/indexed in >>> situations where only docValues are necessary (e.g. a sort-only field)? >>> >>> 2.) Will adding docValues to a field introduce significant performance >>> penalties for non-docValues uses of that field, beyond the obvious fact >>> that the additional data will consume more disk and memory? >> >> No, it's a separate part of the index. >> >> -Yonik >> >> >>> I'm asking this question because the existing schema has some multi-purpose >>> fields, and I'm trying to determine whether I should just add >>> "docValues=true" wherever it might help, or if I need to take a more >>> thoughtful approach and potentially split some fields with copyFields, etc. >>> This is particularly significant because my schema makes use of some >>> dynamic field suffixes, and I'm not sure if I need to add new suffixes to >>> differentiate docValues/non-docValues fields, or if it's okay to turn on >>> docValues across the board "just in case." >>> >>> Apologies if these questions have already been answered - I couldn't find a >>> totally clear answer in the places I searched. >>> >>> Thanks! >>> >>> - Demian
