I think the searching is the bottle neck. Solr/Lucene is slow when the
maxBooleanClauses is bigger enough. 

In my previous example, I should say the large query is broken into 100
smaller ones.

Since we still want facet counts with this large query, is there any way
one can accurately aggregate the facet counts coming back from multiple
threads as you suggested?

Thanks a a lot for your reply,

Jeff

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Yonik
Seeley
Sent: Wednesday, September 23, 2009 4:39 PM
To: [email protected]
Subject: [PMX:FAKE_SENDER] Re: large OR-boolean query

On Wed, Sep 23, 2009 at 4:26 PM, Luo, Jeff <[email protected]> wrote:
> We are experimenting a parallel approach to issue a large OR-Boolean
> query, e.g., keywords:(1 OR 2 OR 3 OR ... OR 102400), against several
> solr shards.
>
> The way we are trying is to break the large query into smaller ones,
> e.g.,
> the example above can be broken into 10 small queries: keywords:(1 OR
2
> OR 3 OR ... OR 1024), keywords:(1025 OR 1026 OR 1027 OR ... OR 2048),
> etc
>
> Now each shard will get 10 requests and the master will merge the
> results coming back from each shard, similar to the regular
distributed
> search.

You're going to end up with a lot of custom code I think.
Where's the bottleneck... searching or faceting?

If faceting is the bottleneck, making an implementation that utilized
multiple threads would be one of the best ways.
If searching, you could develop a custom query type (QParserPlugin)
that handled your type of queries and split them across multiple
threads.

-Yonik
http://www.lucidimagination.com

Reply via email to