I'll try that Thanks Upayavira.
________________________________________
From: Upayavira [u...@odoko.co.uk]
Sent: 09 September 2015 19:30
To: solr-user@lucene.apache.org
Subject: Re: Solr Join between two indexes taking too long.

I've never reviewed that join query debug info - very interesting.

Break it all down - do the universe: query directly. Then see what the
results are and manually construct a query using the results of the join
query. If none of that works, try either a profiler to see where Solr is
spending most of its time, or try connecting a debugger (with source
code) such as Eclipse, to Solr and see where it is spending most of its
time.

Just some thoughts.

Upayavira

On Wed, Sep 9, 2015, at 04:24 PM, Russell Taylor wrote:
> Hi Upayavira,
> Here are a couple examples with debugQuery set.
> I've mislead Mikhail as the query times are getting longer as the list of
> ids gets bigger.
>
> Can you see a reason why where indexB has only 6 id's in its list it
> still takes 46 seconds?
>
> Ids=6
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 46849,
>     "params": {
>       "debugQuery": "true",
>       "indent": "true",
>       "start": "0",
>       "q": "{!join from=sedolKey to=sedolKey
>       fromIndex=indexB}universe:55AL86",
>       "_": "1441810031117",
>       "wt": "json"
>     }
>   },
>   "response": {
>     "numFound": 0,
>     "start": 0,
>     "docs": []
>   },
>   "debug": {
>     "join": {
>       "{!join from=sedolKey to=sedolKey
>       fromIndex=indexB}universe:55AL86": {
>         "time": 46848,
>         "fromSetSize": 6,
>         "toSetSize": 0,
>         "fromTermCount": 11837043,
>         "fromTermTotalDf": 11837043,
>         "fromTermDirectCount": 11837043,
>         "fromTermHits": 6,
>         "fromTermHitsTotalDf": 6,
>         "toTermHits": 0,
>         "toTermHitsTotalDf": 0,
>         "toTermDirectCount": 0,
>         "smallSetsDeferred": 0,
>         "toSetDocsAdded": 0
>       }
>     },
>     "rawquerystring": "{!join from=longValue to=longValue
>     fromIndex=indexB}universe:55AL86",
>     "querystring": "{!join from=longValue to=longValue
>     fromIndex=indexB}universe:55AL86",
>     "parsedquery": "JoinQuery({!join from=longValue to=longValue
>     fromIndex=indexB}universe:55AL86)",
>     "parsedquery_toString": "{!join from=longValue to=longValue
>     fromIndex=indexB}universe:55AL86",
>     "explain": {},
>     "QParser": "",
>     "timing": {
>       "time": 46849,
>       "prepare": {
>         "time": 0,
>         "query": {
>           "time": 0
>         },
>         "facet": {
>           "time": 0
>         },
>         "mlt": {
>           "time": 0
>         },
>         "highlight": {
>           "time": 0
>         },
>         "stats": {
>           "time": 0
>         },
>         "expand": {
>           "time": 0
>         },
>         "debug": {
>           "time": 0
>         }
>       },
>       "process": {
>         "time": 46848,
>         "query": {
>           "time": 46848
>         },
>         "facet": {
>           "time": 0
>         },
>         "mlt": {
>           "time": 0
>         },
>         "highlight": {
>           "time": 0
>         },
>         "stats": {
>           "time": 0
>         },
>         "expand": {
>           "time": 0
>         },
>         "debug": {
>           "time": 0
>         }
>       }
>     }
>   }
> }
>
> ###########################################
> Ids=298
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 51570,
>     "params": {
>       "debugQuery": "true",
>       "indent": "true",
>       "q": "{!join from=longValue to=longValue
>       fromIndex=indexB}universe:16XO52",
>       "_": "1441810442921",
>       "wt": "json"
>     }
>   },
>   "response": {
>     "numFound": 0,
>     "start": 0,
>     "docs": []
>   },
>   "debug": {
>     "join": {
>       "{!join from=longValue to=longValue
>       fromIndex=indexB}universe:16XO52": {
>         "time": 51570,
>         "fromSetSize": 298,
>         "toSetSize": 0,
>         "fromTermCount": 11837043,
>         "fromTermTotalDf": 11837043,
>         "fromTermDirectCount": 11837043,
>         "fromTermHits": 298,
>         "fromTermHitsTotalDf": 298,
>         "toTermHits": 0,
>         "toTermHitsTotalDf": 0,
>         "toTermDirectCount": 0,
>         "smallSetsDeferred": 0,
>         "toSetDocsAdded": 0
>       }
>     },
>     "rawquerystring": "{!join from=longValue to=longValue
>     fromIndex=indexB}universe:16XO52",
>     "querystring": "{!join from=longValue to=longValue
>     fromIndex=indexB}universe:16XO52",
>     "parsedquery": "JoinQuery({!join from=longValue to=longValue
>     fromIndex=indexB}universe:16XO52)",
>     "parsedquery_toString": "{!join from=longValue to=longValue
>     fromIndex=indexB}universe:16XO52",
>     "explain": {},
>     "QParser": "",
>     "timing": {
>       "time": 51570,
>       "prepare": {
>         "time": 0,
>         "query": {
>           "time": 0
>         },
>         "facet": {
>           "time": 0
>         },
>         "mlt": {
>           "time": 0
>         },
>         "highlight": {
>           "time": 0
>         },
>         "stats": {
>           "time": 0
>         },
>         "expand": {
>           "time": 0
>         },
>         "debug": {
>           "time": 0
>         }
>       },
>       "process": {
>         "time": 51570,
>         "query": {
>           "time": 51570
>         },
>         "facet": {
>           "time": 0
>         },
>         "mlt": {
>           "time": 0
>         },
>         "highlight": {
>           "time": 0
>         },
>         "stats": {
>           "time": 0
>         },
>         "expand": {
>           "time": 0
>         },
>         "debug": {
>           "time": 0
>         }
>       }
>     }
>   }
> }
>
> ###################################################################
> Id's = 424088
> "debug": {
>     "join": {
>       "{!join from=longValue to=longValue
>       fromIndex=indexB}universe:LARGE": {
>         "time": 44386,
>         "fromSetSize": 424088,
>         "toSetSize": 892314,
>         "fromTermCount": 11837043,
>         "fromTermTotalDf": 11837043,
>         "fromTermDirectCount": 11837043,
>         "fromTermHits": 420365,
>         "fromTermHitsTotalDf": 420365,
>         "toTermHits": 57074,
>         "toTermHitsTotalDf": 944597,
>         "toTermDirectCount": 55722,
>         "smallSetsDeferred": 1352,
>         "toSetDocsAdded": 892314
>       }
>     },
>     "rawquerystring": "{!join from=longValue to=longValue
>     fromIndex=indexB}universe:LARGE",
>     "querystring": "{!join from=longValue to=longValue
>     fromIndex=indexB}universe:LARGE",
>     "parsedquery": "JoinQuery({!join from=longValue to=longValue
>     fromIndex=indexB}universe:LARGE)",
>     "parsedquery_toString": "{!join from=longValue to=longValue
>     fromIndex=indexB}universe:LARGE",
>     "explain": {
>       "2000000076769983": "\n1.0 = (MATCH)
>       org.apache.solr.search.JoinQuery$JoinQueryWeight@483489aa , product
>       of:\n  1.0 = boost\n  1.0 = queryNorm\n",
>       "2000000076769984": "\n1.0 = (MATCH)
>       org.apache.solr.search.JoinQuery$JoinQueryWeight@118f9aef , product
>       of:\n  1.0 = boost\n  1.0 = queryNorm\n",
>       "2000000076769985": "\n1.0 = (MATCH)
>       org.apache.solr.search.JoinQuery$JoinQueryWeight@7a5d18ac , product
>       of:\n  1.0 = boost\n  1.0 = queryNorm\n",
>       "2000000076769986": "\n1.0 = (MATCH)
>       org.apache.solr.search.JoinQuery$JoinQueryWeight@6d601adc , product
>       of:\n  1.0 = boost\n  1.0 = queryNorm\n",
>       "2000000076769987": "\n1.0 = (MATCH)
>       org.apache.solr.search.JoinQuery$JoinQueryWeight@2e262f31 , product
>       of:\n  1.0 = boost\n  1.0 = queryNorm\n",
>       "2000000076769988": "\n1.0 = (MATCH)
>       org.apache.solr.search.JoinQuery$JoinQueryWeight@4b200302 , product
>       of:\n  1.0 = boost\n  1.0 = queryNorm\n",
>       "2000000076769989": "\n1.0 = (MATCH)
>       org.apache.solr.search.JoinQuery$JoinQueryWeight@569910d2 , product
>       of:\n  1.0 = boost\n  1.0 = queryNorm\n",
>       "2000000076770006": "\n1.0 = (MATCH)
>       org.apache.solr.search.JoinQuery$JoinQueryWeight@635a3843 , product
>       of:\n  1.0 = boost\n  1.0 = queryNorm\n",
>       "2000000076770007": "\n1.0 = (MATCH)
>       org.apache.solr.search.JoinQuery$JoinQueryWeight@5b438a92 , product
>       of:\n  1.0 = boost\n  1.0 = queryNorm\n",
>       "2000000076770029": "\n1.0 = (MATCH)
>       org.apache.solr.search.JoinQuery$JoinQueryWeight@31d9c4c , product
>       of:\n  1.0 = boost\n  1.0 = queryNorm\n"
>     },
>     "QParser": "",
>     "timing": {
>       "time": 480859,
>       "prepare": {
>         "time": 0,
>         "query": {
>           "time": 0
>         },
>         "facet": {
>           "time": 0
>         },
>         "mlt": {
>           "time": 0
>         },
>         "highlight": {
>           "time": 0
>         },
>         "stats": {
>           "time": 0
>         },
>         "expand": {
>           "time": 0
>         },
>         "debug": {
>           "time": 0
>         }
>       },
>       "process": {
>         "time": 480859,
>         "query": {
>           "time": 43737
>         },
>         "facet": {
>           "time": 0
>         },
>         "mlt": {
>           "time": 0
>         },
>         "highlight": {
>           "time": 0
>         },
>         "stats": {
>           "time": 0
>         },
>         "expand": {
>           "time": 0
>         },
>         "debug": {
>           "time": 437122
>         }
>       }
>     }
>   }
> }
>
>
> Thanks
>
>
> Russ.
> -----Original Message-----
> From: Upayavira [mailto:u...@odoko.co.uk]
> Sent: 09 September 2015 13:02
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Join between two indexes taking too long.
>
> To explain what a join does:
>
> It goes over to the joined index, and executes a query. This results in a
> list of "ids" that will be used to do a search on the main index. The
> more of these ids there are, the worse performance will be. Thus, if you
> have 100k documents that match in the join core, you will be doing a 100k
> term search, which will invariably be painful, because the more terms you
> include in the search, the slower it will be.
>
> How many matching docs do you have on the other side of your query?
>
> Upayavira
>
> On Tue, Sep 8, 2015, at 02:09 PM, Russell Taylor wrote:
> > Hi,
> >  I hope somebody can help.
> >
> > We have two indexes, one which holds the descriptive data and the
> > other one which holds lists of docs which are of a certain type
> > (called universes in our world). They need to be joined together to
> > show a list of data from indexA where a filtered indexB (by
> > universe:value) has matching longs (The join field).
> >
> > At the moment the query is taking 55 seconds we need to get it under a
> > second, any help most appreciated.
> >
> > INDEXES:
> >
> > Index a (primary index)
> > 31 million docs with a converted alphanumeric to a long value with a
> > possible 10 million unique values.
> >
> > Index B (the joined index)
> > 250 million documents with a converted alphanumeric to a long value
> > with a possible 10 million unique values.
> > IndexB is filtered by universe which could be between 1 and 500,000 docs.
> >
> > QUERY:
> > http://127.0.0.1:8080/solr/indexA/select?q={!join+from=longValue+to=lo
> > ngValue+fromIndex=IndexB}universe:<http://127.0.0.1:8080/solr/indexA/s
> > elect?q=%7b!join+from=longValue+to=longValue+fromIndex=IndexB%7duniver
> > se:>universeValue
> >
> > Qtime is 55 seconds for either a universe of 5 docs or 500,000 docs.
> >
> >
> >
> > Thanks
> >
> >
> > Russ.
> >
> >
> > *******************************************************
> > This message (including any files transmitted with it) may contain
> > confidential and/or proprietary information, is the property of
> > Interactive Data Corporation and/or its subsidiaries, and is directed
> > only to the addressee(s). If you are not the designated recipient or
> > have reason to believe you received this message in error, please
> > delete this message from your system and notify the sender
> > immediately. An unintended recipient's disclosure, copying,
> > distribution, or use of this message or any attachments is prohibited and 
> > may be unlawful.
> > *******************************************************
>
>
> *******************************************************
> This message (including any files transmitted with it) may contain
> confidential and/or proprietary information, is the property of
> Interactive Data Corporation and/or its subsidiaries, and is directed
> only to the addressee(s). If you are not the designated recipient or have
> reason to believe you received this message in error, please delete this
> message from your system and notify the sender immediately. An unintended
> recipient's disclosure, copying, distribution, or use of this message or
> any attachments is prohibited and may be unlawful.
> *******************************************************
>


*******************************************************
This message (including any files transmitted with it) may contain confidential 
and/or proprietary information, is the property of Interactive Data Corporation 
and/or its subsidiaries, and is directed only to the addressee(s). If you are 
not the designated recipient or have reason to believe you received this 
message in error, please delete this message from your system and notify the 
sender immediately. An unintended recipient's disclosure, copying, 
distribution, or use of this message or any attachments is prohibited and may 
be unlawful. 
*******************************************************

Reply via email to