Thank you for the explanation!

Rebecca Tang
Applications Developer, UCSF CKM
Industry Documents Digital Libraries
E: rebecca.t...@ucsf.edu





On 12/19/14 12:37 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:

>On 12/19/2014 11:22 AM, Tang, Rebecca wrote:
>> I have an index that has a field called collection_facet.
>>
>> There was a value 'Ness Motley Law Firm Documents' that we wanted to
>>update to 'Ness Motley Law Firm'.  There were 36,132 records with this
>>value.  So I re-indexed just the 36,132 records.  After the update, I
>>ran a facet query (q=*:*&facet=true&facet.field=collection_facet) to see
>>if the value got updated and I saw
>> Ness Motley Law Firm 36,132  -- as expected
>> Ness Motley Law Firm Documents 0 ‹ Why is this value still here even
>>though clearly there are no records with this value anymore?  I thought
>>maybe it was cached, so I restarted solr, but I still got the same
>>results.
>>
>> "facet_fields": { "collection_facet": [
>> Š "Ness Motley Law Firm", 36132,
>> Š "Ness Motley Law Firm Documents", 0 ]
>
>Updating a document in Solr is actually a delete of the old document
>followed by indexing a new version.
>
>When a document is deleted from an index, Lucene (the search API that
>Solr uses) does not actually remove that document from the index
>segment, it just writes an ID value to a file that tracks deletes.  That
>document is still in the index, and its terms are still present, but the
>software can remove it from any results when it sees that ID value in
>the delete tracking file(s).  Only a segment merge can eliminate the
>document and remove its terms from the inverted index.
>
>When you do a facet on that field, Lucene still sees "Ness Motley Law
>Firm Documents" in the inverted index, because nothing has actually
>removed it. The upper layers of Solr faceting code are aware that all
>the documents containing that term have been deleted, so it gets a
>correct document count of zero.
>
>To eliminate it from the results, you have two choices.  One is to set
>facet.mincount=1 as a parameter on your query, the other is to run an
>optimize (also known as a forceMerge down to one segment) on the index.
>
>Thanks,
>Shawn
>

Reply via email to