Hi Joel, I tried this patch and it is quite a bit faster. Using the same query on a larger index (500K docs), the 'join' QTime was 1500 msec, and the 'hjoin' QTime was 100 msec! This was for true for large and small result sets.
A few notes: the patch didn't compile with 4.3 because of the SolrCore.getLatestSchema call (which I worked around), and the package name should be: <queryParser name="hjoin" class="org.apache.solr.search.joins.HashSetJoinQParserPlugin"/> Unfortunately, I just learned that our uniqueKey may have to be an alphanumeric string instead of an int, so I'm not out of the woods yet. Good stuff - thanks. Peter On Thu, Sep 26, 2013 at 6:49 PM, Joel Bernstein <joels...@gmail.com> wrote: > It looks like you are using int join keys so you may want to check out > SOLR-4787, specifically the hjoin and bjoin. > > These perform well when you have a large number of results from the > fromIndex. If you have a small number of results in the fromIndex the > standard join will be faster. > > > On Wed, Sep 25, 2013 at 3:39 PM, Peter Keegan <peterlkee...@gmail.com > >wrote: > > > I forgot to mention - this is Solr 4.3 > > > > Peter > > > > > > > > On Wed, Sep 25, 2013 at 3:38 PM, Peter Keegan <peterlkee...@gmail.com > > >wrote: > > > > > I'm doing a cross-core join query and the join query is 30X slower than > > > each of the 2 individual queries. Here are the queries: > > > > > > Main query: http://localhost:8983/solr/mainindex/select?q=title:java > > > QTime: 5 msec > > > hit count: 1000 > > > > > > Sub query: http://localhost:8983/solr/subindex/select?q=+fld1:[0.1 TO > > 0.3] > > > QTime: 4 msec > > > hit count: 25K > > > > > > Join query: > > > > > > http://localhost:8983/solr/mainindex/select?q=title:java&fq={!joinfromIndex=mainindextoIndex=subindexfrom=docid > to=docid}fld1:[0.1 TO 0.3] > > > QTime: 160 msec > > > hit count: 205 > > > > > > Here are the index spec's: > > > > > > mainindex size: 117K docs, 1 segment > > > mainindex schema: > > > <field name="docid" type="int" indexed="true" stored="true" > > > required="true" multiValued="false" /> > > > <field name="title" type="text_en_splitting" indexed="true" > > > stored="true" multiValued="false" /> > > > <uniqueKey>docid</uniqueKey> > > > > > > subindex size: 117K docs, 1 segment > > > subindex schema: > > > <field name="docid" type="int" indexed="true" stored="true" > > > required="true" multiValued="false" /> > > > <field name="fld1" type="float" indexed="true" stored="true" > > > required="false" multiValued="false" /> > > > <uniqueKey>docid</uniqueKey> > > > > > > With debugQuery=true I see: > > > "debug":{ > > > "join":{ > > > "{!join from=docid to=docid fromIndex=subindex}fld1:[0.1 TO > 0.3]":{ > > > "time":155, > > > "fromSetSize":24742, > > > "toSetSize":24742, > > > "fromTermCount":117810, > > > "fromTermTotalDf":117810, > > > "fromTermDirectCount":117810, > > > "fromTermHits":24742, > > > "fromTermHitsTotalDf":24742, > > > "toTermHits":24742, > > > "toTermHitsTotalDf":24742, > > > "toTermDirectCount":24627, > > > "smallSetsDeferred":115, > > > "toSetDocsAdded":24742}}, > > > > > > Via profiler and debugger, I see 150 msec spent in the outer > > > 'while(term!=null)' loop in: JoinQueryWeight.getDocSet(). This seems > > like a > > > lot of time to join the bitsets. Does this seem right? > > > > > > Peter > > > > > > > > > > > > -- > Joel Bernstein > Professional Services LucidWorks >