Re: Cross index join query performance

Peter Keegan Fri, 27 Sep 2013 13:46:12 -0700

Hi Joel,

I tried this patch and it is quite a bit faster. Using the same query on a
larger index (500K docs), the 'join' QTime was 1500 msec, and the 'hjoin'
QTime was 100 msec! This was for true for large and small result sets.


A few notes: the patch didn't compile with 4.3 because of the
SolrCore.getLatestSchema call (which I worked around), and the package name
should be:
<queryParser name="hjoin"
class="org.apache.solr.search.joins.HashSetJoinQParserPlugin"/>

Unfortunately, I just learned that our uniqueKey may have to be an
alphanumeric string instead of an int, so I'm not out of the woods yet.

Good stuff - thanks.

Peter


On Thu, Sep 26, 2013 at 6:49 PM, Joel Bernstein <joels...@gmail.com> wrote:

> It looks like you are using int join keys so you may want to check out
> SOLR-4787, specifically the hjoin and bjoin.
>
> These perform well when you have a large number of results from the
> fromIndex. If you have a small number of results in the fromIndex the
> standard join will be faster.
>
>
> On Wed, Sep 25, 2013 at 3:39 PM, Peter Keegan <peterlkee...@gmail.com
> >wrote:
>
> > I forgot to mention - this is Solr 4.3
> >
> > Peter
> >
> >
> >
> > On Wed, Sep 25, 2013 at 3:38 PM, Peter Keegan <peterlkee...@gmail.com
> > >wrote:
> >
> > > I'm doing a cross-core join query and the join query is 30X slower than
> > > each of the 2 individual queries. Here are the queries:
> > >
> > > Main query: http://localhost:8983/solr/mainindex/select?q=title:java
> > > QTime: 5 msec
> > > hit count: 1000
> > >
> > > Sub query: http://localhost:8983/solr/subindex/select?q=+fld1:[0.1 TO
> > 0.3]
> > > QTime: 4 msec
> > > hit count: 25K
> > >
> > > Join query:
> > >
> >
> http://localhost:8983/solr/mainindex/select?q=title:java&fq={!joinfromIndex=mainindextoIndex=subindexfrom=docid
>  to=docid}fld1:[0.1 TO 0.3]
> > > QTime: 160 msec
> > > hit count: 205
> > >
> > > Here are the index spec's:
> > >
> > > mainindex size: 117K docs, 1 segment
> > > mainindex schema:
> > >    <field name="docid" type="int" indexed="true" stored="true"
> > > required="true" multiValued="false" />
> > >    <field name="title" type="text_en_splitting" indexed="true"
> > > stored="true" multiValued="false" />
> > >    <uniqueKey>docid</uniqueKey>
> > >
> > > subindex size: 117K docs, 1 segment
> > > subindex schema:
> > >    <field name="docid" type="int" indexed="true" stored="true"
> > > required="true" multiValued="false" />
> > >    <field name="fld1" type="float" indexed="true" stored="true"
> > > required="false" multiValued="false" />
> > >    <uniqueKey>docid</uniqueKey>
> > >
> > > With debugQuery=true I see:
> > >   "debug":{
> > >     "join":{
> > >       "{!join from=docid to=docid fromIndex=subindex}fld1:[0.1 TO
> 0.3]":{
> > >         "time":155,
> > >         "fromSetSize":24742,
> > >         "toSetSize":24742,
> > >         "fromTermCount":117810,
> > >         "fromTermTotalDf":117810,
> > >         "fromTermDirectCount":117810,
> > >         "fromTermHits":24742,
> > >         "fromTermHitsTotalDf":24742,
> > >         "toTermHits":24742,
> > >         "toTermHitsTotalDf":24742,
> > >         "toTermDirectCount":24627,
> > >         "smallSetsDeferred":115,
> > >         "toSetDocsAdded":24742}},
> > >
> > > Via profiler and debugger, I see 150 msec spent in the outer
> > > 'while(term!=null)' loop in: JoinQueryWeight.getDocSet(). This seems
> > like a
> > > lot of time to join the bitsets. Does this seem right?
> > >
> > > Peter
> > >
> > >
> >
>
>
>
> --
> Joel Bernstein
> Professional Services LucidWorks
>

Re: Cross index join query performance

Reply via email to