To piggy back on this, what would be the right scenarios to use
docvalues='true'?

On Tue, Feb 13, 2018 at 1:10 PM, Chris Hostetter <hossman_luc...@fucit.org>
wrote:

>
> : We are using Solr 7.1.0 to index a database of addresses.  We have found
> : that our index size increases massively when we add one extra field to
> : the index, even though that field is stored and not indexed, and doesn’t
>
> what about docValues?
>
> : When we run an index load without the problematic field present, the
> : Solr index size is 5.5GB.  When we add the field into the index, the
> : size grows to 13.3GB.  The field itself is a maximum of 46 characters in
> : length and on average is 19 characters. We have ~14,000,000 rows in
> : total to index of which only ~200,000 have this field present at all
> : (i.e. not null in database).  Given that we don’t want to index the
> : field, only store it I would have thought (perhaps naively) that the
> : storage increase would be approximately 200,000 * 19 = 3.8M bytes =
> : 3.6MB rather than the 7.5GB we are seeing.
>
> if the field has docValues enabled, then there will be some overhead for
> every doc in the index -- even the ones that don't have a value in this
> field.  (allthough i'd still be very suprised if it accounted for 7G)
>
> : - The problematic field is created through the API as follows:
> :
> :   curl -X POST -H 'Content-type:application/json' --data-binary '{
> :     "add-field":{
> :       "name":"buildingName",
> :       "type":"string",
> :       "stored":true,
> :       "indexed":false
> :     }
> :   }' http://localhost:8983/solr/address/schema
>
> ...that's going to cause the field to inherit any (non-overridden)
> settings from the fieldType "string" -- in the 7.1 _default configset,
> "string" is defined with docValues="true"
>
> You can see *all* properties set on a field -- regardless of wether they
> are set on the fieldType, or are implicit hardcoded defaults in the
> implementation of the fieldType via the 'showDefaults=true' Schema API
> option.
>
> Consider these API examples from the techproducts demo...
>
> $ curl 'http://localhost:8983/solr/techproducts/schema/fields/cat'
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":0},
>   "field":{
>     "name":"cat",
>     "type":"string",
>     "multiValued":true,
>     "indexed":true,
>     "stored":true}}
>
> $ curl 'http://localhost:8983/solr/techproducts/schema/fields/
> cat?showDefaults=true'
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":0},
>   "field":{
>     "name":"cat",
>     "type":"string",
>     "indexed":true,
>     "stored":true,
>     "docValues":false,
>     "termVectors":false,
>     "termPositions":false,
>     "termOffsets":false,
>     "termPayloads":false,
>     "omitNorms":true,
>     "omitTermFreqAndPositions":true,
>     "omitPositions":false,
>     "storeOffsetsWithPositions":false,
>     "multiValued":true,
>     "large":false,
>     "sortMissingLast":true,
>     "required":false,
>     "tokenized":false,
>     "useDocValuesAsStored":true}}
>
>
>
>
>
>
>
> -Hoss
> http://www.lucidworks.com/

Reply via email to