Re: Consecutive calls to a query give different results

2017-09-08 Thread Erick Erickson
Here's Mike McCandless' blog on the topic: https://www.elastic.co/blog/lucenes-handling-of-deleted-documents The same options he mentions are available in Solr as both use Lucene under the covers. The long and short of it is that you can have a significant amount of deleted documents in your

Re: Consecutive calls to a query give different results

2017-09-08 Thread Webster Homer
Thank you, Erick Erickson and Shawn Heisey for your excellent answers. For some of our collections, it would seem that an occasional optimize would be a good thing. However we have some collections that are updated constantly Would using the commit expungeDeletes help mitigate the issue? I also

Re: Consecutive calls to a query give different results

2017-09-08 Thread Shawn Heisey
On 9/7/2017 8:54 AM, Webster Homer wrote: > I am not concerned about deleted documents. I am concerned that the same > search gives different results after each search. The top document seems to > cycle between 3 different documents > > I have an enhanced collections info api call that calls the

Re: Consecutive calls to a query give different results

2017-09-08 Thread Webster Homer
We have several cloud collections, but this one is updated once a day with a partial load, and once a week with a full load, followed by a delete which is based upon an index_date field (timestamp of the solr record). For this and related collections optimizing once per day is probably

Re: Consecutive calls to a query give different results

2017-09-07 Thread Erick Erickson
bq: So apparently it IS essential to run optimize after a data load Don't do this if you can avoid it, you run the risk of excessive amounts of your index consisting of deleted documents unless you are following a process whereby you periodically (and I'm talking at least hours, if not once per

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer
We have several solr clouds, a couple of them have only 1 replica per shard. We have never observed the problem when we have a single replica only when there are multiple replicas per shard. On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer wrote: > the scores are not the

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer
the scores are not the same Doc 305340 432.44238 C2646 428.24185 12837 430.61722 One other thing. I just ran optimize and now document 305340 is consistently the top score. So apparently it IS essential to run optimize after a data load Note we see this behavior fairly commonly on our

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer
the scores are not the same Doc 305340 432.44238 On Thu, Sep 7, 2017 at 10:02 AM, David Hastings < hastings.recurs...@gmail.com> wrote: > "I am concerned that the same > search gives different results after each search. The top document seems to > cycle between 3 different documents" > > > if

Re: Consecutive calls to a query give different results

2017-09-07 Thread David Hastings
"I am concerned that the same search gives different results after each search. The top document seems to cycle between 3 different documents" if you do debug query on the search, are the scores for the top 3 documents the same or not? you can easily have three documents with the same score, so

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer
I am not concerned about deleted documents. I am concerned that the same search gives different results after each search. The top document seems to cycle between 3 different documents I have an enhanced collections info api call that calls the core admin api to get the index information for the

Re: Consecutive calls to a query give different results

2017-09-07 Thread Erick Erickson
Whew! I haven't been lying to people for _years_.. On Thu, Sep 7, 2017 at 5:58 AM, Yonik Seeley wrote: > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson > wrote: >> bq: and deleted documents are irrelevant to term statistics... >> >> Did you mean

Re: Consecutive calls to a query give different results

2017-09-07 Thread Yonik Seeley
On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson wrote: > bq: and deleted documents are irrelevant to term statistics... > > Did you mean "relevant"? Or do I have to adjust my thinking _again_? One can make it work either way ;-) Whether a document is marked as deleted or

Re: Consecutive calls to a query give different results

2017-09-06 Thread Erick Erickson
bq: and deleted documents are irrelevant to term statistics... Did you mean "relevant"? Or do I have to adjust my thinking _again_? Erick On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley wrote: > Different replicas of the same shard can have different numbers of > deleted

Re: Consecutive calls to a query give different results

2017-09-06 Thread Yonik Seeley
Different replicas of the same shard can have different numbers of deleted documents (really just marked as deleted), and deleted documents are irrelevant to term statistics (like the number of documents a term appears in). Documents marked for deletion stop contributing to corpus statistics when

Consecutive calls to a query give different results

2017-09-06 Thread Webster Homer
I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4 replicas (total of 4 nodes). If I run the query multiple times I see the three different top scoring results. No data load is running, all data has been commited I get these three different hits with their scores: