Re: Sharding with a SolrCloud

Erick Erickson Wed, 31 Jul 2013 08:47:12 -0700

Well, assuming you have solved the differences
in statistics between the index you maintain and
the one in the cloud with respect to the scoring...


My comment about indexing is probably
irrelevant, you're not indexing anything to the
SolrCloud cluster.

But still doubt this will work. Here's the problem:

Internally, the round-trip looks like this:
node 1 receives request
node 1 sends requests to all the shards
node 1 receives the top N docs from each shard
node 1 collates those to the "real" top N
node1 then queries each shard for the docs hosted on those shards.

This last step is where I'd expect just adding shard to
the list that happened to be a separate SolrCloud instance
to fall down, the originating node would expect to just get
the documents from the shard it knew about.

And if you list _all_ the shards in the SolrCloud instance,
then each of them will distribute the request to all shards
in the SolrCloud instance, confusing things even more.

Much of this is speculation, but I can imagine a number
of ways this scenario would go bad, it wasn't one of the
design goals as far as I know.

Best
Erick

On Wed, Jul 31, 2013 at 11:01 AM, Oliver Goldschmidt
<o.goldschm...@tuhh.de> wrote:
> Thank you very much for that information, Erick. That was what I was
> fearing...
>
> Well, the problem, why I am trying to do this is, that the SolrCloud is
> managed by someone else. We are indexing some content to a pretty small
> local index. To this index we have complete access and can do whatever
> we want to do. But we also need the seperate index, which is now moving
> into the cloud. Its not possible to put our local content into the
> cloud, because we are not maintaining it and have no write permission to it.
>
> But why shouldn't that work? Isn't Solr Cloud acting like one solr
> server? The indices have to be maintained seperately - can't I just
> continue using them as shards and get one result list from both of them
> (thats how I did it before they wanted to switch to Solr Cloud)?
>
> Though, if there is no way to use the cloud as a shard, we will have to
> think about how to solve that. Of course we can split up the queries and
> make two queries (one for the cloud and one for our local index). But
> this might be a bit confusing for the user.
>
> Thank you again, best
> - Oliver
>
> Am 31.07.2013 16:39, schrieb Erick Erickson:
>> You're in uncharted territory. I can imagine you use
>> a SolrCloud cluster as a separate Solr for a federated
>> search, but using it as a single shard just seems wrong.
>>
>> If nothing else, indexing to the shards will require that
>> the documents be routed correctly. But having one
>> shard in SolrCloud and another shard managed
>> externally seems ripe for getting the docs indexed
>> to various shards you're not expecting, unless you're
>> using explicit routing....
>>
>> All in all, this _really_ sounds like something you should
>> not be attempting. Why are you trying to do this? Is it
>> possible to just set up a SolrCloud cluster and index
>> all the docs to it and be done with it?
>>
>> 'cause I think you'll end up with endless problems given
>> what you've described.
>>
>> Best
>> Erick
>>
>> On Wed, Jul 31, 2013 at 5:16 AM, Oliver Goldschmidt
>> <o.goldschm...@tuhh.de> wrote:
>>> Hi list,
>>>
>>> I have a Solr server, which uses sharding to make distributed search
>>> with another Solr server. The other Solr server now migrates to a Solr
>>> Cloud system. I've been trying recently to continue searching the Solr
>>> Cloud as a shard for my Solr server, but this is failing with mysterious
>>> effects. I am getting a result with a number of hits, when I perform a
>>> search, but the results are not displayed at all. This is the resonse
>>> header I am getting from Solr:
>>>
>>> {
>>>   "responseHeader":{
>>>     "status":0,
>>>     "QTime":305,
>>>     "params":{
>>>       "facet":"true",
>>>       "indent":"yes",
>>>       "facet.mincount":"1",
>>>       "facet.limit":"30",
>>>       "qf":"title_short^750 title_full_unstemmed^600",
>>>       "json.nl":"arrarr",
>>>       "wt":"json",
>>>       "rows":"20",
>>>       "shards":"ourindex.nowhere.de/solr/index",
>>>       "bq":"format:Book^500",
>>>       "fl":"*,score",
>>>       "facet.sort":"count",
>>>       "start":"0",
>>>       "q":"xml",
>>>       "shards.info":"true",
>>>       "facet.prefix":"",
>>>       "facet.field":["publishDate"],
>>>       "qt":"dismax"}},
>>>   "shards.info":{
>>>     "ourindex.nowhere.de/solr/index":{
>>>       "numFound":10076,
>>>       "maxScore":8.507474,
>>>       "time":263}},
>>>   "response":{"numFound":10056,"start":0,"maxScore":8.507474,"docs":[]
>>>   }
>>>
>>> As you can see, there are no docs in the result. This result is not 100%
>>> reproducable: sometimes I get no results displayed, other times it works
>>> (with the same query URL!). As you also can see in the result, the
>>> number of hits in the response is a little bit less than the number of
>>> hits sent from the shard.
>>>
>>> This makes me wonder if it is not possible to use a Solr Cloud as a
>>> shard for another standalone Solr server?
>>>
>>> Any hint is appreciated!
>>>
>>> Best
>>> - Oliver
>>>
>>> --
>>> Oliver Goldschmidt
>>> TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
>>> Denickestr. 22
>>> 21071 Hamburg - Harburg
>>> Tel.    +49 (0)40 / 428 78 - 32 91
>>> eMail   o.goldschm...@tuhh.de
>>> --
>>> GPG/PGP-Schlüssel:
>>> http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc
>>>
>
>
> --
> Oliver Goldschmidt
> TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
> Denickestr. 22
> 21071 Hamburg - Harburg
> Tel.    +49 (0)40 / 428 78 - 32 91
> eMail   o.goldschm...@tuhh.de
> --
> GPG/PGP-Schlüssel:
> http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc
>

Re: Sharding with a SolrCloud

Reply via email to