Hello everyone,

We have large index size in case norms are enabled.

schema.xml:

type declaration:
<fieldType name="simpleTokenizer" class="solr.TextField"
positionIncrementGap="100" omitNorms="false">
     <analyzer>
         <tokenizer class="solr.KeywordTokenizerFactory" />
     </analyzer>
</fieldType>

fields declaration:
<field name="id" stored="true" indexed="true" required="true"
type="string" />
<field name="name" stored="true" indexed="true" type="string" />
<dynamicField name="unique_*" stored="false" indexed="true"
type="simpleTokenizer" multiValued="false" />

For 5000 documents (every document has 2 unique fields, 2*5000=10000
unique fields in index), index size is 48.24 MB.
But if we enable omitting norms (omitNorms="true"), index size is 0.56
MB.

Next, if we increase number of unique fields per document to 3
(3*5000=15000 unique fields in index) we receive: 72.23 MB and 0.70 MB
respectively.
And if we increase number of documents to 10000 ( 3*10000 unique fields
in index) we receive: 287.54 MB and 1.44 MB respectively.

We've prepared test application to reproduce mentioned behavior. It can
be downloaded here:
https://bitbucket.org/coldserenity/solr-large-index-with-norms

Could anyone point out if size of index is as expected in mentioned
cases? And if it's, what configuration can be applied to reduce size of
index.

Thank you in advance, Ivan

Reply via email to