To piggy back on this, what would be the right scenarios to use docvalues='true'?
On Tue, Feb 13, 2018 at 1:10 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : We are using Solr 7.1.0 to index a database of addresses. We have found > : that our index size increases massively when we add one extra field to > : the index, even though that field is stored and not indexed, and doesn’t > > what about docValues? > > : When we run an index load without the problematic field present, the > : Solr index size is 5.5GB. When we add the field into the index, the > : size grows to 13.3GB. The field itself is a maximum of 46 characters in > : length and on average is 19 characters. We have ~14,000,000 rows in > : total to index of which only ~200,000 have this field present at all > : (i.e. not null in database). Given that we don’t want to index the > : field, only store it I would have thought (perhaps naively) that the > : storage increase would be approximately 200,000 * 19 = 3.8M bytes = > : 3.6MB rather than the 7.5GB we are seeing. > > if the field has docValues enabled, then there will be some overhead for > every doc in the index -- even the ones that don't have a value in this > field. (allthough i'd still be very suprised if it accounted for 7G) > > : - The problematic field is created through the API as follows: > : > : curl -X POST -H 'Content-type:application/json' --data-binary '{ > : "add-field":{ > : "name":"buildingName", > : "type":"string", > : "stored":true, > : "indexed":false > : } > : }' http://localhost:8983/solr/address/schema > > ...that's going to cause the field to inherit any (non-overridden) > settings from the fieldType "string" -- in the 7.1 _default configset, > "string" is defined with docValues="true" > > You can see *all* properties set on a field -- regardless of wether they > are set on the fieldType, or are implicit hardcoded defaults in the > implementation of the fieldType via the 'showDefaults=true' Schema API > option. > > Consider these API examples from the techproducts demo... > > $ curl 'http://localhost:8983/solr/techproducts/schema/fields/cat' > { > "responseHeader":{ > "status":0, > "QTime":0}, > "field":{ > "name":"cat", > "type":"string", > "multiValued":true, > "indexed":true, > "stored":true}} > > $ curl 'http://localhost:8983/solr/techproducts/schema/fields/ > cat?showDefaults=true' > { > "responseHeader":{ > "status":0, > "QTime":0}, > "field":{ > "name":"cat", > "type":"string", > "indexed":true, > "stored":true, > "docValues":false, > "termVectors":false, > "termPositions":false, > "termOffsets":false, > "termPayloads":false, > "omitNorms":true, > "omitTermFreqAndPositions":true, > "omitPositions":false, > "storeOffsetsWithPositions":false, > "multiValued":true, > "large":false, > "sortMissingLast":true, > "required":false, > "tokenized":false, > "useDocValuesAsStored":true}} > > > > > > > > -Hoss > http://www.lucidworks.com/