Re: Cores and and ranking (search quality)

johnmunir Tue, 10 Mar 2015 10:24:12 -0700

Thanks Erick for trying to help, I really appreciate it.  Unfortunately, I'm 
still stuck.

There are times one must know the inner working and behavior of the software to 
make design decision and this one is one of them.  If I know the inner working 
of Solr, I would not be asking.  In addition, I'm in the design process, so I'm 
not able to fully test.  Beside my test could be invalid because I may not set 
it up right due to my lack of understanding the inner working of Solr.

Given this, I hope you don't mind me asking again.

If I have two cores, one core has 10 docs another has 100,000 docs.  I then 
submit two docs that are 100% identical (with the exception of the unique-ID 
fields, which is stored but not indexed) one to each core.  The question is, 
during search, will both of those docs rank near each other or not?  If so, 
this is great because it will behave the same as if I had one core and index 
both docs to this single core.  If not, which core's doc will rank higher and 
how far apart the two docs be from each other in the ranking?

Put another way: are docs from the smaller core (the one has 10 docs only) rank 
higher or lower compared to docs from the larger core (the one with 100,000) 
docs?

Thanks!

-- MJ

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, March 10, 2015 11:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Cores and and ranking (search quality)

SOLR-1632 will certainly help. But trying to predict whether your core A or 
core B will appear first doesn't really seem like a good use of time. If you 
actually have a setup like you describe, add &debug=all to your query on both 
cores and you'll see all the gory detail of how the scores are calculated, 
providing a definitive answer in _your_ situation.

Best,
Erick

On Mon, Mar 9, 2015 at 5:44 AM,  <johnmu...@aol.com> wrote:
> (reposing this to see if anyone can help)
>
>
> Help me understand this better (regarding ranking).
>
> If I have two docs that are 100% identical with the exception of uid (which 
> is stored but not indexed).  In a single core setup, if I search "xyz" such 
> that those 2 docs end up ranking as #1 and #2.  When I switch over to two 
> core setup, doc-A goes to core-A (which has 10 records) and doc-B goes to 
> core-B (which has 100,000 records).
>
> Now, are you saying in 2 core setup if I search on "xyz" (just like in singe 
> core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking? 
>  That is, are you saying doc-A may now be somewhere at the top / bottom far 
> away from doc-B?  If so, which will be #1: the doc off core-A (that has 10 
> records) or doc-B off core-B (that has 100,000 records)?
>
> If I got all this right, are you saying SOLR-1632 will fix this issue such 
> that the end result will now be as if I had 1 core?
>
> - MJ
>
>
> -----Original Message-----
> From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
> Sent: Thursday, March 5, 2015 9:06 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Cores and and ranking (search quality)
>
> On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote:
>> My question is this: if I put my data in multiple cores and use 
>> distributed search will the ranking be different if I had all my data 
>> in a single core?
>
> Yes, it will be different. The practical impact depends on how homogeneous 
> your data are across the shards and how large your shards are. If you have 
> small and dissimilar shards, your ranking will suffer a lot.
>
> Work is being done to remedy this:
> https://issues.apache.org/jira/browse/SOLR-1632
>
>> Also, will facet and more-like-this quality / result be the same?
>
> It is not formally guaranteed, but for most practical purposes, faceting on 
> multi-shards will give you the same results as single-shards.
>
> I don't know about more-like-this. My guess is that it will be affected in 
> the same way that standard searches are.
>
>> Also, reading the distributed search wiki
>> (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr 
>> does the search and result merging (all I have to do is issue a 
>> search), is this correct?
>
> Yes. From a user-perspective, searches are no different.
>
> - Toke Eskildsen, State and University Library, Denmark
>

Re: Cores and and ranking (search quality)

Reply via email to