Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Erick Erickson Mon, 06 Oct 2014 19:54:07 -0700

I think there were some holes that would allow replicas and leaders to
be out of synch that have been patched up in the last 3 releases.


There shouldn't be anything you need to do to keep these in synch, so
if you can capture what happened when things got out of synch we'll
fix it. But a lot has changed in the last several months, so the first
thing I'd do if possible is to upgrade to 4.10.1.


Best,
Erick

On Mon, Oct 6, 2014 at 2:41 PM, S.L <simpleliving...@gmail.com> wrote:
> Hi Erick,
>
> Before I tried your suggestion of  issung a commit=true update, I realized 
> that for eaach shard there was atleast a node that had its index directory 
> named like index.<timestamp>.
>
> I went ahead and deleted index directory that restarted that core and now the 
> index directory got syched with the other node and is properly named as 
> 'index' without any timestamp attached to it.This is now giving me consistent 
> results for distrib=true using a load balancer.Also distrib=false returns 
> expexted results for a given shard.
>
> The underlying issue appears to be that in every shard the leader and the 
> replica(follower) were out of sych.
>
> How can I avoid this from happening again?
>
> Thanks for your help!
>
> Sent from my HTC
>
> ----- Reply message -----
> From: "Erick Erickson" <erickerick...@gmail.com>
> To: <solr-user@lucene.apache.org>
> Subject: SolrCloud 4.7 not doing distributed search when querying from a load 
> balancer.
> Date: Fri, Oct 3, 2014 12:56 AM
>
> Hmmmm. Assuming that you aren't re-indexing the doc you're searching for...
>
> Try issuing http://blah blah:8983/solr/collection/update?commit=true.
> That'll force all the docs to be searchable. Does <1> still hold for
> the document in question? Because this is exactly backwards of what
> I'd expect. I'd expect, if anything, the replica (I'm trying to call
> it the "follower" when a distinction needs to be made since the leader
> is a "replica" too....) would be out of sync. This is still a Bad
> Thing, but the leader gets first crack at indexing thing.
>
> bq: only the replica of the shard that has this key returns the result
> , and the leader does not ,
>
> Just to be sure we're talking about the same thing. When you say
> "leader", you mean the shard leader, right? The filled-in circle on
> the graph view from the admin/cloud page.
>
> And let's see your soft and hard commit settings please.
>
> Best,
> Erick
>
> On Thu, Oct 2, 2014 at 9:48 PM, S.L <simpleliving...@gmail.com> wrote:
>> Eirck,
>>
>> 0> Load balancer is out of the picture
>> .
>> 1>When I query with *distrib=false* , I get consistent results as expected
>> for those shards that dont have the key i.e I dont get the results back for
>> those shards, however I just realized that while *distrib=false* is present
>> in the query for the shard that is supposed to contain the key,only the
>> replica of the shard that has this key returns the result , and the leader
>> does not , looks like replica and the leader do not have the same data and
>> replica seems to contain the key in the query for that shard.
>>
>> 2> By indexing I mean this collection is being populated by a web crawler.
>>
>> So looks like 1> above  is pointing to leader and replica being out of
>> synch for atleast one shard.
>>
>>
>>
>> On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>
>>> bq: Also ,the collection is being actively indexed as I query this, could
>>> that
>>> be an issue too ?
>>>
>>> Not if the documents you're searching aren't being added as you search
>>> (and all your autocommit intervals have expired).
>>>
>>> I would turn off indexing for testing, it's just one more variable
>>> that can get in the way of understanding this.
>>>
>>> Do note that if the problem were endemic to Solr, there would probably
>>> be a _lot_ more noise out there.
>>>
>>> So to recap:
>>> 0> we can take the load balancer out of the picture all together.
>>>
>>> 1> when you query each shard individually with &distrib=true, every
>>> replica in a particular shard returns the same count.
>>>
>>> 2> when you query without &distrib=true you get varying counts.
>>>
>>> This is very strange and not at all expected. Let's try it again
>>> without indexing going on....
>>>
>>> And what do you mean by "indexing" anyway? How are documents being fed
>>> to your system?
>>>
>>> Best,
>>> Erick@PuzzledAsWell
>>>
>>> On Thu, Oct 2, 2014 at 7:32 PM, S.L <simpleliving...@gmail.com> wrote:
>>> > Erick,
>>> >
>>> > I would like to add that the interesting behavior i.e point #2 that I
>>> > mentioned in my earlier reply  happens in all the shards , if this were
>>> to
>>> > be a distributed search issue this should have not manifested itself in
>>> the
>>> > shard that contains the key that I am searching for , looks like the
>>> search
>>> > is just failing as whole intermittently .
>>> >
>>> > Also ,the collection is being actively indexed as I query this, could
>>> that
>>> > be an issue too ?
>>> >
>>> > Thanks.
>>> >
>>> > On Thu, Oct 2, 2014 at 10:24 PM, S.L <simpleliving...@gmail.com> wrote:
>>> >
>>> >> Erick,
>>> >>
>>> >> Thanks for your reply, I tried your suggestions.
>>> >>
>>> >> 1 . When not using loadbalancer if  *I have distrib=false* I get
>>> >> consistent results across the replicas.
>>> >>
>>> >> 2. However here's the insteresting part , while not using load balancer
>>> if
>>> >> I *dont have distrib=false* , then when I query a particular node ,I get
>>> >> the same behaviour as if I were using a loadbalancer , meaning the
>>> >> distributed search from a node works intermittently .Does this give any
>>> >> clue ?
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson <erickerick...@gmail.com
>>> >
>>> >> wrote:
>>> >>
>>> >>> Hmmm, nothing quite makes sense here....
>>> >>>
>>> >>> Here are some experiments:
>>> >>> 1> avoid the load balancer and issue queries like
>>> >>> http://solr_server:8983/solr/collection/q=whatever&distrib=false
>>> >>>
>>> >>> the &distrib=false bit will cause keep SolrCloud from trying to send
>>> >>> the queries anywhere, they'll be served only from the node you address
>>> >>> them to.
>>> >>> that'll help check whether the nodes are consistent. You should be
>>> >>> getting back the same results from each replica in a shard (i.e. 2 of
>>> >>> your 6 machines).
>>> >>>
>>> >>> Next, try your failing query the same way.
>>> >>>
>>> >>> Next, try your failing query from a browser, pointing it at successive
>>> >>> nodes.
>>> >>>
>>> >>> Where is the first place problems show up?
>>> >>>
>>> >>> My _guess_ is that your load balancer isn't quite doing what you
>>> think, or
>>> >>> your cluster isn't set up the way you think it is, but those are
>>> guesses.
>>> >>>
>>> >>> Best,
>>> >>> Erick
>>> >>>
>>> >>> On Thu, Oct 2, 2014 at 2:51 PM, S.L <simpleliving...@gmail.com> wrote:
>>> >>> > Hi All,
>>> >>> >
>>> >>> > I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
>>> >>> > replication factor of 2 .
>>> >>> >
>>> >>> > I have fronted these 6 Solr nodes using a load balancer , what I
>>> notice
>>> >>> is
>>> >>> > that every time I do a search of the form
>>> >>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a
>>> result
>>> >>> > only once in every 3 tries , telling me that the load balancer is
>>> >>> > distributing the requests between the 3 shards and SolrCloud only
>>> >>> returns a
>>> >>> > result if the request goes to the core that as that id .
>>> >>> >
>>> >>> > However if I do a simple search like q=*:* , I consistently get the
>>> >>> right
>>> >>> > aggregated results back of all the documents across all the shards
>>> for
>>> >>> > every request from the load balancer. Can someone please let me know
>>> >>> what
>>> >>> > this is symptomatic of ?
>>> >>> >
>>> >>> > Somehow Solr Cloud seems to be doing search query distribution and
>>> >>> > aggregation for queries of type *:* only.
>>> >>> >
>>> >>> > Thanks.
>>> >>>
>>> >>
>>> >>
>>>

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Reply via email to