Try adding shards.info=true and debug=track to your queries ... these will give more detailed information about what's going behind the scenes.
On Mon, Oct 13, 2014 at 11:11 PM, S.L <simpleliving...@gmail.com> wrote: > Erick, > > I have upgraded to SolrCloud 4.10.1 with the same toplogy , 3 shards and 2 > replication factor with six cores altogether. > > Unfortunately , I still see the issue of intermittently no results being > returned.I am not able to figure out whats going on here, I have included > the logging information below. > > *Here's the query that I run.* > > > http://server1.mydomain.com:8081/solr/dyCollection1/select/?q=*:*&fq=%28id:220a8dce-3b31-4d46-8386-da8405595c47%29&wt=json&distrib=true > > > > *Scenario 1: No result returned.* > > *Log Information for Scenario #1 .* > 92860314 [http-bio-8081-exec-103] INFO > org.apache.solr.handler.component.SpellCheckComponent – > > http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/ > null > 92860315 [http-bio-8081-exec-103] INFO > org.apache.solr.handler.component.SpellCheckComponent – > > http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/ > null > 92860315 [http-bio-8081-exec-103] INFO > org.apache.solr.handler.component.SpellCheckComponent – > > http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/ > null > 92860315 [http-bio-8081-exec-103] INFO org.apache.solr.core.SolrCore – > [dyCollection1_shard2_replica1] webapp=/solr path=/select/ > > params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)} > hits=0 status=0 QTime=5 > > *Scenario #2 : I get result back* > > > > *Log information for scenario #2.*92881911 [http-bio-8081-exec-177] INFO > org.apache.solr.core.SolrCore – [dyCollection1_shard2_replica1] > webapp=/solr path=/select > > params={spellcheck=true&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url= > > http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&fl=productURL,score&df=suggestAggregate&start=0&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fsv=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5 > } > hits=1 status=0 QTime=1 > 92881913 [http-bio-8081-exec-177] INFO org.apache.solr.core.SolrCore – > [dyCollection1_shard2_replica1] webapp=/solr path=/select > > params={spellcheck=false&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&ids= > > http://www.searcheddomain.com/p/ironwork-8-piece-comforter-set/-/A-15273248&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url=http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&df=suggestAggregate&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5 > } > status=0 QTime=0 > 92881914 [http-bio-8081-exec-169] INFO > org.apache.solr.handler.component.SpellCheckComponent – > > http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/ > null > 92881914 [http-bio-8081-exec-169] INFO > org.apache.solr.handler.component.SpellCheckComponent – > > http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/ > null > 92881914 [http-bio-8081-exec-169] INFO > org.apache.solr.handler.component.SpellCheckComponent – > > http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/ > null > 92881914 [http-bio-8081-exec-169] INFO > org.apache.solr.handler.component.SpellCheckComponent – > > http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/ > null > 92881915 [http-bio-8081-exec-169] INFO org.apache.solr.core.SolrCore – > [dyCollection1_shard2_replica1] webapp=/solr path=/select/ > > params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)} > hits=1 status=0 QTime=7 > > > *Autocommit and Soft commit settings.* > > <autoSoftCommit> > <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> > </autoSoftCommit> > > <autoCommit> > <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> > > <openSearcher>true</openSearcher> > </autoCommit> > > > > On Tue, Oct 7, 2014 at 12:22 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > Not, I'm not guaranteeing that it'll actually cure the problem, just > > that enough has changed since 4.7 that it'd be a good place to start. > > > > Things have been reported off and on, but they're often pesky race > > conditions or something else that takes a long time to track down, you > > just are lucky perhaps ;)... > > > > Erick > > > > On Mon, Oct 6, 2014 at 8:04 PM, S.L <simpleliving...@gmail.com> wrote: > > > Erick, > > > > > > Thanks for the suggestion , I am not sure if I would be able to capture > > > what went wrong , so upgrading to 4.10 seems easier even though it > means > > , > > > a days work of effort :) . I will go ahead and upgrade and let me know > , > > > although I am surprised that this issue never got reported for 4.7 up > > until > > > now. > > > > > > Thanks again for your help! > > > > > > > > > > > > On Mon, Oct 6, 2014 at 10:52 PM, Erick Erickson < > erickerick...@gmail.com > > > > > > wrote: > > > > > >> I think there were some holes that would allow replicas and leaders to > > >> be out of synch that have been patched up in the last 3 releases. > > >> > > >> There shouldn't be anything you need to do to keep these in synch, so > > >> if you can capture what happened when things got out of synch we'll > > >> fix it. But a lot has changed in the last several months, so the first > > >> thing I'd do if possible is to upgrade to 4.10.1. > > >> > > >> > > >> Best, > > >> Erick > > >> > > >> On Mon, Oct 6, 2014 at 2:41 PM, S.L <simpleliving...@gmail.com> > wrote: > > >> > Hi Erick, > > >> > > > >> > Before I tried your suggestion of issung a commit=true update, I > > >> realized that for eaach shard there was atleast a node that had its > > index > > >> directory named like index.<timestamp>. > > >> > > > >> > I went ahead and deleted index directory that restarted that core > and > > >> now the index directory got syched with the other node and is properly > > >> named as 'index' without any timestamp attached to it.This is now > > giving me > > >> consistent results for distrib=true using a load balancer.Also > > >> distrib=false returns expexted results for a given shard. > > >> > > > >> > The underlying issue appears to be that in every shard the leader > and > > >> the replica(follower) were out of sych. > > >> > > > >> > How can I avoid this from happening again? > > >> > > > >> > Thanks for your help! > > >> > > > >> > Sent from my HTC > > >> > > > >> > ----- Reply message ----- > > >> > From: "Erick Erickson" <erickerick...@gmail.com> > > >> > To: <solr-user@lucene.apache.org> > > >> > Subject: SolrCloud 4.7 not doing distributed search when querying > > from a > > >> load balancer. > > >> > Date: Fri, Oct 3, 2014 12:56 AM > > >> > > > >> > Hmmmm. Assuming that you aren't re-indexing the doc you're searching > > >> for... > > >> > > > >> > Try issuing http://blah > blah:8983/solr/collection/update?commit=true. > > >> > That'll force all the docs to be searchable. Does <1> still hold for > > >> > the document in question? Because this is exactly backwards of what > > >> > I'd expect. I'd expect, if anything, the replica (I'm trying to call > > >> > it the "follower" when a distinction needs to be made since the > leader > > >> > is a "replica" too....) would be out of sync. This is still a Bad > > >> > Thing, but the leader gets first crack at indexing thing. > > >> > > > >> > bq: only the replica of the shard that has this key returns the > result > > >> > , and the leader does not , > > >> > > > >> > Just to be sure we're talking about the same thing. When you say > > >> > "leader", you mean the shard leader, right? The filled-in circle on > > >> > the graph view from the admin/cloud page. > > >> > > > >> > And let's see your soft and hard commit settings please. > > >> > > > >> > Best, > > >> > Erick > > >> > > > >> > On Thu, Oct 2, 2014 at 9:48 PM, S.L <simpleliving...@gmail.com> > > wrote: > > >> >> Eirck, > > >> >> > > >> >> 0> Load balancer is out of the picture > > >> >> . > > >> >> 1>When I query with *distrib=false* , I get consistent results as > > >> expected > > >> >> for those shards that dont have the key i.e I dont get the results > > back > > >> for > > >> >> those shards, however I just realized that while *distrib=false* is > > >> present > > >> >> in the query for the shard that is supposed to contain the key,only > > the > > >> >> replica of the shard that has this key returns the result , and the > > >> leader > > >> >> does not , looks like replica and the leader do not have the same > > data > > >> and > > >> >> replica seems to contain the key in the query for that shard. > > >> >> > > >> >> 2> By indexing I mean this collection is being populated by a web > > >> crawler. > > >> >> > > >> >> So looks like 1> above is pointing to leader and replica being out > > of > > >> >> synch for atleast one shard. > > >> >> > > >> >> > > >> >> > > >> >> On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson < > > >> erickerick...@gmail.com> > > >> >> wrote: > > >> >> > > >> >>> bq: Also ,the collection is being actively indexed as I query > this, > > >> could > > >> >>> that > > >> >>> be an issue too ? > > >> >>> > > >> >>> Not if the documents you're searching aren't being added as you > > search > > >> >>> (and all your autocommit intervals have expired). > > >> >>> > > >> >>> I would turn off indexing for testing, it's just one more variable > > >> >>> that can get in the way of understanding this. > > >> >>> > > >> >>> Do note that if the problem were endemic to Solr, there would > > probably > > >> >>> be a _lot_ more noise out there. > > >> >>> > > >> >>> So to recap: > > >> >>> 0> we can take the load balancer out of the picture all together. > > >> >>> > > >> >>> 1> when you query each shard individually with &distrib=true, > every > > >> >>> replica in a particular shard returns the same count. > > >> >>> > > >> >>> 2> when you query without &distrib=true you get varying counts. > > >> >>> > > >> >>> This is very strange and not at all expected. Let's try it again > > >> >>> without indexing going on.... > > >> >>> > > >> >>> And what do you mean by "indexing" anyway? How are documents being > > fed > > >> >>> to your system? > > >> >>> > > >> >>> Best, > > >> >>> Erick@PuzzledAsWell > > >> >>> > > >> >>> On Thu, Oct 2, 2014 at 7:32 PM, S.L <simpleliving...@gmail.com> > > wrote: > > >> >>> > Erick, > > >> >>> > > > >> >>> > I would like to add that the interesting behavior i.e point #2 > > that I > > >> >>> > mentioned in my earlier reply happens in all the shards , if > this > > >> were > > >> >>> to > > >> >>> > be a distributed search issue this should have not manifested > > itself > > >> in > > >> >>> the > > >> >>> > shard that contains the key that I am searching for , looks like > > the > > >> >>> search > > >> >>> > is just failing as whole intermittently . > > >> >>> > > > >> >>> > Also ,the collection is being actively indexed as I query this, > > could > > >> >>> that > > >> >>> > be an issue too ? > > >> >>> > > > >> >>> > Thanks. > > >> >>> > > > >> >>> > On Thu, Oct 2, 2014 at 10:24 PM, S.L <simpleliving...@gmail.com > > > > >> wrote: > > >> >>> > > > >> >>> >> Erick, > > >> >>> >> > > >> >>> >> Thanks for your reply, I tried your suggestions. > > >> >>> >> > > >> >>> >> 1 . When not using loadbalancer if *I have distrib=false* I > get > > >> >>> >> consistent results across the replicas. > > >> >>> >> > > >> >>> >> 2. However here's the insteresting part , while not using load > > >> balancer > > >> >>> if > > >> >>> >> I *dont have distrib=false* , then when I query a particular > node > > >> ,I get > > >> >>> >> the same behaviour as if I were using a loadbalancer , meaning > > the > > >> >>> >> distributed search from a node works intermittently .Does this > > give > > >> any > > >> >>> >> clue ? > > >> >>> >> > > >> >>> >> > > >> >>> >> > > >> >>> >> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson < > > >> erickerick...@gmail.com > > >> >>> > > > >> >>> >> wrote: > > >> >>> >> > > >> >>> >>> Hmmm, nothing quite makes sense here.... > > >> >>> >>> > > >> >>> >>> Here are some experiments: > > >> >>> >>> 1> avoid the load balancer and issue queries like > > >> >>> >>> > > http://solr_server:8983/solr/collection/q=whatever&distrib=false > > >> >>> >>> > > >> >>> >>> the &distrib=false bit will cause keep SolrCloud from trying > to > > >> send > > >> >>> >>> the queries anywhere, they'll be served only from the node you > > >> address > > >> >>> >>> them to. > > >> >>> >>> that'll help check whether the nodes are consistent. You > should > > be > > >> >>> >>> getting back the same results from each replica in a shard > > (i.e. 2 > > >> of > > >> >>> >>> your 6 machines). > > >> >>> >>> > > >> >>> >>> Next, try your failing query the same way. > > >> >>> >>> > > >> >>> >>> Next, try your failing query from a browser, pointing it at > > >> successive > > >> >>> >>> nodes. > > >> >>> >>> > > >> >>> >>> Where is the first place problems show up? > > >> >>> >>> > > >> >>> >>> My _guess_ is that your load balancer isn't quite doing what > you > > >> >>> think, or > > >> >>> >>> your cluster isn't set up the way you think it is, but those > are > > >> >>> guesses. > > >> >>> >>> > > >> >>> >>> Best, > > >> >>> >>> Erick > > >> >>> >>> > > >> >>> >>> On Thu, Oct 2, 2014 at 2:51 PM, S.L < > simpleliving...@gmail.com> > > >> wrote: > > >> >>> >>> > Hi All, > > >> >>> >>> > > > >> >>> >>> > I am trying to query a 6 node Solr4.7 cluster with 3 shards > > >> and a > > >> >>> >>> > replication factor of 2 . > > >> >>> >>> > > > >> >>> >>> > I have fronted these 6 Solr nodes using a load balancer , > > what I > > >> >>> notice > > >> >>> >>> is > > >> >>> >>> > that every time I do a search of the form > > >> >>> >>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf) it gives > > me a > > >> >>> result > > >> >>> >>> > only once in every 3 tries , telling me that the load > > balancer is > > >> >>> >>> > distributing the requests between the 3 shards and SolrCloud > > only > > >> >>> >>> returns a > > >> >>> >>> > result if the request goes to the core that as that id . > > >> >>> >>> > > > >> >>> >>> > However if I do a simple search like q=*:* , I consistently > > get > > >> the > > >> >>> >>> right > > >> >>> >>> > aggregated results back of all the documents across all the > > >> shards > > >> >>> for > > >> >>> >>> > every request from the load balancer. Can someone please let > > me > > >> know > > >> >>> >>> what > > >> >>> >>> > this is symptomatic of ? > > >> >>> >>> > > > >> >>> >>> > Somehow Solr Cloud seems to be doing search query > distribution > > >> and > > >> >>> >>> > aggregation for queries of type *:* only. > > >> >>> >>> > > > >> >>> >>> > Thanks. > > >> >>> >>> > > >> >>> >> > > >> >>> >> > > >> >>> > > >> > > >