Re: Solr 7.0.1 Duplicate document appearing in search results

2019-05-15 Thread Erick Erickson
> On May 15, 2019, at 10:53 AM, Erick Erickson wrote: > > Or something unexpected like there being no defined in the schema > somehow. Meant to say that somehow the schemas used during your process weren’t what you thought they were and “somehow” didn’t have a defined. That would require

Re: Solr 7.0.1 Duplicate document appearing in search results

2019-05-15 Thread Erick Erickson
> On May 14, 2019, at 7:46 PM, Adam Walz wrote: > > but do > use an external map reduce process to reindex Here’s where I’d look then. Not knowing any details of your process this may be totally wrong of course…. If there’s any step that performs a MERGEINDEX operation, _and_ somehow the sa

Re: Solr 7.0.1 Duplicate document appearing in search results

2019-05-14 Thread Adam Walz
Thanks Erick, We've never merged indexes. We don't use the MapReduceIndexerTool, but do use an external map reduce process to reindex. To reindex from an empty state we have a map reduce job which runs on a separate HBase cluster and indexes into this shard. During this job each mapper is concurre

Re: Solr 7.0.1 Duplicate document appearing in search results

2019-05-14 Thread Erick Erickson
This is indeed strange. First of all, forget about explanations that involve the transaction log etc. When Lucene opens a searcher, it is only for closed segments, the tlog has nothing to do with that. Have you ever merget indexes? The MapReduceIndexerTool, if you ever used it, does not de-dupl

Solr 7.0.1 Duplicate document appearing in search results

2019-05-14 Thread Adam Walz
In my solr schema I have set a uniqueKey of "id" where the id field is a solr.StrField. When querying with this field as a filter I would expect to always get 1 or 0 documents as a result. However I am getting back multiple documents with the same "id" field, but different internal `docid`s. This p