Thanks for the reply. I can see that in solr 6, more than 50% of the index
directory is occupied by ".nvd" file extension. It is something related to
norms and doc values.

On Tue, Feb 21, 2017 at 10:27 AM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Did you look in the data directories to check what index file extensions
> contribute most to the difference? That could give a hint.
>
> Regards,
>     Alex
>
> On 21 Feb 2017 9:47 AM, "Pratik Patel" <pra...@semandex.net> wrote:
>
> > Here is the same question in stackOverflow for better format.
> >
> > http://stackoverflow.com/questions/42370231/solr-
> > dynamic-field-blowing-up-the-index-size
> >
> > Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine
> but
> > the problem is that index size with solr 6 is way too large. In solr 5,
> > index size was about 15GB and in solr 6, for the same data, the index
> size
> > is 300GB! I am not able to understand what contributes to such huge
> > difference in solr 6.
> >
> > I have been able to identify a field which is blowing up the size of
> index.
> > It is as follows.
> >
> > <dynamicField name="*_note" type="text_general" indexed="true"
> > stored="true" multiValued="true"  />
> >
> > <field name="textproperty" type="text_general" indexed="true"
> > stored="false" multiValued="true"  />
> > <copyField source="*_note" dest="textproperty"/>
> >
> > When this field is commented out, the index size reduces to less than
> 10GB.
> >
> > This field is of type text_general. Following is the definition of this
> > type.
> >
> > <fieldType name="text_general" class="solr.TextField"
> > positionIncrementGap="100">
> >       <analyzer type="index">
> >         <charFilter class="solr.HTMLStripCharFilterFactory" />
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <charFilter class="solr.PatternReplaceCharFilterFactory"
> > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> >         <filter class="solr.WordDelimiterFilterFactory"
> > protected="protwords.txt" generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="0"/>
> >         <filter class="solr.KStemFilterFactory" />
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/
> > solr-6.4.1/server/solr/collection1/conf/stopwords.txt"
> > />
> >       </analyzer>
> >       <analyzer type="query">
> >         <charFilter class="solr.HTMLStripCharFilterFactory" />
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <charFilter class="solr.PatternReplaceCharFilterFactory"
> > pattern="((?m)[a-z]+)'s" replacement="$1s" />
> >         <filter class="solr.WordDelimiterFilterFactory"
> > protected="protwords.txt" generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="0"/>
> >         <filter class="solr.KStemFilterFactory" />
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="C:/Users/pratik/Desktop/solr-6.4.1_playground/
> > solr-6.4.1/server/solr/collection1/conf/stopwords.txt"
> > />
> >       </analyzer>
> >   </fieldType>
> >
> > Few things which I did to debug this issue:
> >
> >    - I have ensured that field type definition is same as what I was
> using
> >    in solr 5 and it is also valid in version 6. This field type
> considers a
> >    list of "stopwords" to be ignored during indexing. I have supplied the
> > same
> >    list of stopwords which we were using in solr 5. I have verified that
> > path
> >    of this file is correct and it is being loaded fine in solr admin UI.
> > When
> >    I analyse these fields using "Analysis" tab of the solr admin UI, I
> can
> > see
> >    that stopwords are being filtered out. However, when I query with some
> > of
> >    these stopwords, I do get the results back which makes me think that
> >    probably stopwords are being indexed.
> >
> > Any idea what could increase the size of index by so much in solr 6?
> >
>

Reply via email to