Re: Distributed search cross cluster

2018-01-31 Thread Jan Høydahl
Erick: > ...one for each cluster and just merged the docs when it got them back This would be the logical way. I'm afraid that "just merged the docs" is the crux here, that would make this an expensive task. You'd have to merge docs, facets, highlights etc, handle the different search phases

Re: Distributed search cross cluster

2018-01-31 Thread Jan Høydahl
Hi, I am an ex FAST employee and actually used Unity a lot myself, even hacking the code writing custom mixers etc :) That is all cool, if you want to write a generic federation layer. In our case we only ever need to talk to Solr instances with exactly the same schema and doument types,

Re: Distributed search cross cluster

2018-01-31 Thread Bernd Fehling
Many years ago, in a different universe, when Federated Search was a buzzword we used Unity from FAST FDS (which is now MS ESP). It worked pretty well across many systems like FAST FDS, Google, Gigablast, ... Very flexible with different mixers, parsers, query transformers. Was written in Python

Re: Distributed search cross cluster

2018-01-31 Thread Charlie Hull
On 30/01/2018 16:09, Jan Høydahl wrote: Hi, A customer has 10 separate SolrCloud clusters, with same schema across all, but different content. Now they want users in each location to be able to federate a search across all locations. Each location is 100% independent, with separate ZK etc.

Re: Distributed search cross cluster

2018-01-30 Thread Erick Erickson
Jan: Hmmm, must Solr do the work? On some level it seems easier if your middle layer (behind your single IP) has 10 CloudSolrClient thread pools, one for each cluster and just merged the docs when it got them back. That would take care of all of the goodness of internal LBs and all that.

Distributed search cross cluster

2018-01-30 Thread Jan Høydahl
Hi, A customer has 10 separate SolrCloud clusters, with same schema across all, but different content. Now they want users in each location to be able to federate a search across all locations. Each location is 100% independent, with separate ZK etc. Bandwidth and latency between the clusters