Re: Join Scoring

anand chandak Thu, 13 Feb 2014 04:38:17 -0800

Thanks Mike, that surely helps to clarify the difference.

On the related note, if we have provide a scoring support for solrjoin, instead of using lucene join, what would be best way to do that .There's one suggestion that david gave below :- build a custom QParserand call Lucene's JOIN (JoinUtil.JoinQuery), any other possiblity likemodifying the JoinQueryParser and building the scoring support natively? Would u recommend doing that ? If yes, can u provide some high levelpointers.



Anand.


On 2/13/2014 5:07 PM, Michael McCandless wrote:

I suspect (not certain) one reason for the performance difference with
Solr vs Lucene joins is that Solr operates on a top-level reader?

This results in fast joins, but it means whenever you open a new
reader (NRT reader) there is a high cost to regenerate the top-level
data structures.

But if the app doesn't open NRT readers, or opens them rarely, perhaps
that cost is a good tradeoff to get faster joins.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 13, 2014 at 12:10 AM, anand chandak
<anand.chan...@oracle.com> wrote:

Re-posting...



Thanks,

Anand



On 2/12/2014 10:55 AM, anand chandak wrote:

Thanks David, really helpful response.

You mentioned that if we have to add scoring support in solr then a
possible approach would be to add a custom QueryParser, which might be
taking Lucene's JOIN module.  I have tired this approach and this makes it
slow, because I believe this is making more searches..

Curious, if it is possible instead to enhance existing solr's
JoinQParserPlugin and add the the scoring support in the same class ? Do you
think its feasible and recommended ? If yes, what would it take (highlevel)
- in terms of code changes, any pointers ?


Thanks,

Anand


On 2/12/2014 10:31 AM, David Smiley (@MITRE.org) wrote:

Hi Anand.

Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster and
more memory efficient (particularly the worse-case memory use) to
implement
the JOIN query without scoring, so that's why.  Of course, you might want
it
to score and pay whatever penalty is involved.  For that you'll need to
write a Solr "QueryParser" that might use Lucene's "join" module which
has
scoring variants.  I've taken this approach before.  You asked a specific
question about the purpose of JoinScorer when it doesn't actually score.
Lucene's "Query" produces a "Weight" which in turn produces a "Scorer"
that
is a DocIdSetIterator plus it returns a score.  So Queries have to have a
Scorer to match any document even if the score is always 1.

Solr does indeed have a lot of caching; that may be in play here when
comparing against a quick attempt at using Lucene directly.  In
particular,
the matching documents are likely to end up in Solr's DocumentCache.
Returning stored fields that come back in search results are one of the
more
expensive things Lucene/Solr does.

I also think you noted that the fields on documents from the "from" side
of
the query are not available to be returned in search results, just the
"to"
side.  Yup; that's true.  To remedy this, you might write a Solr
SearchComponent that adds fields from the "from" side.  That could be
tricky
to do; it would probably need to re-run the from-side query but filtered
to
the matching top-N documents being returned.

~ David


anand chandak wrote

Resending, if somebody can please respond.


Thanks,

Anand


On 2/5/2014 6:26 PM, anand chandak wrote:
Hi,

Having a question on join score, why doesn't the solr join query return
the scores. Looking at the code, I see there's JoinScorer defined in
the  JoinQParserPlugin class ? If its not used for scoring ? where is it
actually used.

Also, to evaluate the performance of solr join plugin vs lucene
joinutil, I filed same join query against same data-set and same schema
and in the results, I am always seeing the Qtime for Solr much lower
then lucenes. What is the reason behind this ?  Solr doesn't return
scores could that cause so much difference ?

My guess is solr has very sophisticated caching mechanism and that might
be coming in play, is that true ? or there's difference in the way JOIN
happens in the 2 approach.

If I understand correctly both the implementation are using 2 pass
approach - first all the terms from fromField and then returns all
documents that have matching terms in a toField

If somebody can throw some light, would highly appreciate.

Thanks,
Anand



-----
   Author:
http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context:
http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Join Scoring

Reply via email to