On Thu, 2019-02-07 at 11:24 +0900, Yasufumi Mizoguchi wrote: > Actually, stored is compressed but I believed that docValues was > compressed > in some strategies depending on > field's values/density as following java doc says. > https://lucene.apache.org/core/7_6_0/core/org/apache/lucene/codecs/lucene70/Lucene70DocValuesFormat.html
In scenarios with low diversity in Strings (city names for example), DocValues de-duplication can work very well. It is hard to generally compare the size of stored vs. doc values as the strategies are very different and the relative difference is highly dependent on content. As for query performance, Shawn is technically correct that there will be no impact on query performance (as long as you don't use indexed=false, docvalues=true). But it does influence document retrieval time. Under most circumstances the difference will be small, but if you retrieve a large number of documents or your corpus is large (measured in documents), it can be significant: https://lucene.apache.org/solr/guide/7_6/docvalues.html#retrieving-docvalues-during-search Specifically, the Solr 7 series has poor random access (used for document retrieval) doc values performance for indexes with many documents. - Toke Eskildsen, royal Danish Library