Re: Distributed search cross cluster

Charlie Hull Wed, 31 Jan 2018 01:06:51 -0800

On 30/01/2018 16:09, Jan Høydahl wrote:

Hi,


A customer has 10 separate SolrCloud clusters, with same schema across all, but 
different content.
Now they want users in each location to be able to federate a search across all 
locations.
Each location is 100% independent, with separate ZK etc. Bandwidth and latency 
between the
clusters is not an issue, they are actually in the same physical datacenter.

Now my first thought was using a custom &shards parameter, and let the 
receiving node fan
out to all shards of all clusters. We’d need to contact the ZK for each 
environment and find
all shards and replicas participating in the collection and then construct the 
shards=A1|A2,B1|B2…
sting which would be quite big, but if we get it right, it should “just work".

Now, my question is whether there are other smarter ways that would leave it up 
to existing Solr
logic to select shards and load balance, that would also take into account any 
shard.keys/_route_
info etc. I thought of these
   * &collection=collA,collB  — but it only supports collections local to one 
cloud
   * Create a collection ALIAS to point to all 10 — but same here, only local 
to one cluster
   * Streaming expression top(merge(search(q=,zkHost=blabla))) — but we want it 
with pure search API
   * Write a custom ShardHandler plugin that knows about all clusters — but 
this is complex stuff :)
   * Write a custom SearchComponent plugin that knows about all clusters and adds 
the &shards= param

Another approach would be for the originating cluster to fan out just ONE 
request to each of the other
clusters and then write some SearchComponent to merge those responses. That 
would let us query
the other clusters using one LB IP address instead of requiring full visibility 
to all solr nodes
of all clusters, but if we don’t need that isolation, that extra merge code 
seems fairly complex.

So far I opt for the custom SearchComponent and &shards= param approach. Any 
useful input from
someone who tried a similar approach would be priceless!


Hi Jan,

We actually looked at this for the BioSolr project - a SolrCloud ofSolrClouds. Unfortunately the funding didn't appear for the project sowe didn't take it any further than some rough ideas - as you say, if youget it right it should 'just work'. We had some extra complications interms of shared partial schemas...


Cheers

Charlie


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: Distributed search cross cluster

Reply via email to