On Fri, 2017-01-13 at 14:19 +0000, Sebastian Riemer wrote:
> the second search should have been this: http://localhost:8983/solr/w
> emi/select?fq=m_mediaType_s:%221%22&indent=on&q=*:*&rows=0&start=0&wt
> =json 
> (or in other words, give me all documents having value "1" for field
> "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the
> facet.fields result-count list?

Qualified guess (I don't know the JSON faceting code in details):
The list of possible facet values is extracted from the DocValues
structure in the segment files, without respect to documents marked as
deleted. At some point you had one or more documents with
m_mediaType_s:1, which were later deleted.

If your index is not too large, you can verify this by optimizing down
to 1 segment, which will remove all traces of deleted documents (unless
the index is already 1 segment).

If you cannot live with the false terms, committing with
expungeDeletes=true should do the trick, although it is likely to make
your indexing process a lot heavier.

The reason for this inaccuracy is that it is quite heavy to verify
whether a docvalue is referenced by a document: Each time one or more
documents in a segment are deleted, all references from all documents
in that segment would have to be checked to create a correct mapping.
As this only affects mincount=0 combined with your use case where
_all_ documents with a certain docvalue are deleted, my guess it that
it is seen as too much of an edge case to handle.
-- 
Toke Eskildsen, Royal Danish Library

Reply via email to