Not got time to read your mail in depth, but I bet it is because you are
overwriting docs. When docs are overwritten, they are effectively marked
as deleted then re-inserted, thus leaving you with both versions of your
doc physically in your index. When you query though, the deleted one is
filtered out.

At some point later in time, when the number of commits you have made
results on too many segments, a merge will be triggered, and this will
remove the deleted documents from those merged segments.

Compare the numDocs (number of undeleted docs) and the maxDocs (number
of documents, whether deleted or not) for your index. I bet one will be
2x the other.

Upayavira

On Mon, Feb 15, 2016, at 08:12 PM, Steven White wrote:
> Hi folks,
> 
> I'm fixing code that I noticed to have a defect.  My expectation was that
> once I make the fix, the index size will be smaller but instead I see it
> growing.
> 
> Here is the stripped down version of the code to show the issue:
> 
> Buggy code #1:
> 
>   for (String field : fieldsList)
>   {
>     doc.addField(SolrField_ID_LIST, "1"); // <== Notice how I'm adding
>     the
> same value over and over
>     doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
>   }
> 
>   docsToAdd.add(doc);
> 
> Fixed code #2:
> 
>   for (String field : fieldsList)
>   {
>     doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
>   }
> 
>   doc.addField(SolrField_ID_LIST, "1"); // <== Notice how I'm now adding
> this value only once
> 
>   docsToAdd.add(doc);
> 
> I index the exact same data in both cases; all that changed is the logic
> of
> the code per the above.
> 
> On my test index of 1000 records, when I look at Solr's admin page (same
> is
> true looking at the physical disk in the "index" folder) the index size
> for
> #1 is 834.77 KB, but for #2 it is 1.56 MB.
> 
> As a side test, I changed the code to the following:
> 
> Test code #3:
> 
>   for (String field : fieldsList)
>   {
>     doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
>   }
> 
>   // doc.addField(SolrField_ID_LIST, "1"); // <== I no longer include
>   this
> field
> 
>   docsToAdd.add(doc);
> 
> And now the index size is 2.27 MB !!!
> 
> Yes, each time I run the test, i start with a fresh empty index (num
> docs:
> 0, index size: 0).
> 
> Here are my field definitions:
> 
>   <field name="ALL_FIELDS_DATA" type="text" multiValued="true"
> indexed="true" required="false" stored="false"/>
>   <field name="ID_LIST" type="string" multiValued="true" indexed="true"
> required="false" stored="false"/>
> 
> My question is, why my index size is going up in size?  I was expecting
> it
> to go down because I'm now indexing less data into each Solr document.
> 
> Thanks
> 
> Steve

Reply via email to