Thank you for the explanation! Rebecca Tang Applications Developer, UCSF CKM Industry Documents Digital Libraries E: rebecca.t...@ucsf.edu
On 12/19/14 12:37 PM, "Shawn Heisey" <apa...@elyograg.org> wrote: >On 12/19/2014 11:22 AM, Tang, Rebecca wrote: >> I have an index that has a field called collection_facet. >> >> There was a value 'Ness Motley Law Firm Documents' that we wanted to >>update to 'Ness Motley Law Firm'. There were 36,132 records with this >>value. So I re-indexed just the 36,132 records. After the update, I >>ran a facet query (q=*:*&facet=true&facet.field=collection_facet) to see >>if the value got updated and I saw >> Ness Motley Law Firm 36,132 -- as expected >> Ness Motley Law Firm Documents 0 ‹ Why is this value still here even >>though clearly there are no records with this value anymore? I thought >>maybe it was cached, so I restarted solr, but I still got the same >>results. >> >> "facet_fields": { "collection_facet": [ >> Š "Ness Motley Law Firm", 36132, >> Š "Ness Motley Law Firm Documents", 0 ] > >Updating a document in Solr is actually a delete of the old document >followed by indexing a new version. > >When a document is deleted from an index, Lucene (the search API that >Solr uses) does not actually remove that document from the index >segment, it just writes an ID value to a file that tracks deletes. That >document is still in the index, and its terms are still present, but the >software can remove it from any results when it sees that ID value in >the delete tracking file(s). Only a segment merge can eliminate the >document and remove its terms from the inverted index. > >When you do a facet on that field, Lucene still sees "Ness Motley Law >Firm Documents" in the inverted index, because nothing has actually >removed it. The upper layers of Solr faceting code are aware that all >the documents containing that term have been deleted, so it gets a >correct document count of zero. > >To eliminate it from the results, you have two choices. One is to set >facet.mincount=1 as a parameter on your query, the other is to run an >optimize (also known as a forceMerge down to one segment) on the index. > >Thanks, >Shawn >