Tim,

Thanks for the suggestion.

I have rerun the query by adding shards.info=true and debug= track. I have
included the xml data for both teh scenarios below , thin happens
intermittently on SolrCloud 4.10.1 , with a replication factor of 2 and 3
shards (6 cores) , I get result in one execution of query and then no
results for the subsequent one , I am hoping someone would be able to help
me find the root cause with this additional information ,I have included
the query output with the additional parameters for the both the scenarios
below .

Thanks for your help!

*Scenario #1 : In this try I get no results back. Here is what the query
returns.*

<?xml version="1.0" encoding="UTF-8"?>
<response>
   <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">29</int>
      <lst name="params">
         <str name="q">*:*</str>
         <str name="shards.info">true</str>
         <str name="distrib">true</str>
         <str name="debug">track</str>
         <str name="wt">xml</str>
         <str name="fq">(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb)</str>
      </lst>
   </lst>
   <lst name="shards.info">
      <lst name="
http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
">
         <long name="numFound">0</long>
         <float name="maxScore">0.0</float>
         <str name="shardAddress">
http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1</str>
         <long name="time">4</long>
      </lst>
      <lst name="
http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
">
         <long name="numFound">0</long>
         <float name="maxScore">0.0</float>
         <str name="shardAddress">
http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1</str>
         <long name="time">13</long>
      </lst>
      <lst name="
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
">
         <long name="numFound">0</long>
         <float name="maxScore">0.0</float>
         <str name="shardAddress">
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1</str>
         <long name="time">26</long>
      </lst>
   </lst>
   <result name="response" numFound="0" start="0" maxScore="0.0" />
   <lst name="spellcheck">
      <lst name="suggestions">
         <bool name="correctlySpelled">false</bool>
      </lst>
   </lst>
   <lst name="debug">
      <lst name="track">
         <str
name="rid">server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17</str>
         <lst name="EXECUTE_QUERY">
            <lst name="
http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
">
               <str name="QTime">1</str>
               <str name="ElapsedTime">4</str>
               <str name="RequestPurpose">GET_TOP_IDS</str>
               <str name="NumFound">0</str>
               <str
name="Response">{responseHeader={status=0,QTime=1,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
track],version=2,NOW=1413398784225,shard.url=
http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}</str>
            </lst>
            <lst name="
http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
">
               <str name="QTime">10</str>
               <str name="ElapsedTime">13</str>
               <str name="RequestPurpose">GET_TOP_IDS</str>
               <str name="NumFound">0</str>
               <str
name="Response">{responseHeader={status=0,QTime=10,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
track],version=2,NOW=1413398784225,shard.url=
http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}</str>
            </lst>
            <lst name="
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
">
               <str name="QTime">24</str>
               <str name="ElapsedTime">26</str>
               <str name="RequestPurpose">GET_TOP_IDS</str>
               <str name="NumFound">0</str>
               <str
name="Response">{responseHeader={status=0,QTime=24,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
track],version=2,NOW=1413398784225,shard.url=
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}</str>
            </lst>
         </lst>
      </lst>
   </lst>
</response>

*Scenario #2 : In this try I get results back. Here is what the query
returns.*

<?xml version="1.0" encoding="UTF-8"?>
<response>
   <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">35</int>
      <lst name="params">
         <str name="q">*:*</str>
         <str name="shards.info">true</str>
         <str name="distrib">true</str>
         <str name="debug">track</str>
         <str name="wt">xml</str>
         <str name="fq">(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb)</str>
      </lst>
   </lst>
   <lst name="shards.info">
      <lst name="
http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
">
         <long name="numFound">0</long>
         <float name="maxScore">0.0</float>
         <str name="shardAddress">
http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2</str>
         <long name="time">4</long>
      </lst>
      <lst name="
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
">
         <long name="numFound">1</long>
         <float name="maxScore">1.0</float>
         <str name="shardAddress">
http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2</str>
         <long name="time">17</long>
      </lst>
      <lst name="
http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
">
         <long name="numFound">0</long>
         <float name="maxScore">0.0</float>
         <str name="shardAddress">
http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2</str>
         <long name="time">29</long>
      </lst>
   </lst>
   <result name="response" numFound="1" start="0" maxScore="1.0">
      <doc>
         <str name="thingURL">http://www.redacted.com/ip/Cutter-Bite</str>

         <str name="id">e8995da8-7d98-4010-93b4-8ff7dffb8bfb</str>
         <long name="_version_">1481991045188157440</long>
      </doc>
   </result>
   <lst name="spellcheck">
      <lst name="suggestions">
         <bool name="correctlySpelled">false</bool>
      </lst>
   </lst>
   <lst name="debug">
      <lst name="track">
         <str
name="rid">server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16</str>
         <lst name="EXECUTE_QUERY">
            <lst name="
http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
">
               <str name="QTime">2</str>
               <str name="ElapsedTime">4</str>
               <str name="RequestPurpose">GET_TOP_IDS</str>
               <str name="NumFound">0</str>
               <str
name="Response">{responseHeader={status=0,QTime=2,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
track],version=2,NOW=1413398738457,shard.url=
http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}</str>
            </lst>
            <lst name="
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
">
               <str name="QTime">14</str>
               <str name="ElapsedTime">17</str>
               <str name="RequestPurpose">GET_TOP_IDS</str>
               <str name="NumFound">1</str>
               <str
name="Response">{responseHeader={status=0,QTime=14,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
track],version=2,NOW=1413398738457,shard.url=
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=1,start=0,maxScore=1.0,docs=[SolrDocument{thingURL=
http://www.redacted.com/ip/Cutter-Bite-MD-Insect-Bite-Relief-.5-fl-oz/12166875,
score=1.0}]},sort_values={},debug={}}</str>
            </lst>
            <lst name="
http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
">
               <str name="QTime">26</str>
               <str name="ElapsedTime">29</str>
               <str name="RequestPurpose">GET_TOP_IDS</str>
               <str name="NumFound">0</str>
               <str
name="Response">{responseHeader={status=0,QTime=26,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
track],version=2,NOW=1413398738457,shard.url=
http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}</str>
            </lst>
         </lst>
         <lst name="GET_FIELDS">
            <lst name="
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
">
               <str name="QTime">1</str>
               <str name="ElapsedTime">3</str>
               <str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
               <str name="NumFound">1</str>
               <str
name="Response">{responseHeader={status=0,QTime=1,params={spellcheck=false,spellcheck.maxCollationTries=10,distrib=false,debug=[track,
track],version=2,df=suggestAggregate,shard.url=
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/,NOW=1413398738457,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,ids=http://www.redacted.com/ip/Cutter-Bite,spellcheck.collate=true,wt=javabin,requestPurpose=GET_FIELDS,GET_DEBUG,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=1,start=0,docs=[SolrDocument{thingURL=
http://www.redacted.com/ip/Cutter-Bite,
id=e8995da8-7d98-4010-93b4-8ff7dffb8bfb,
_version_=1481991045188157440}]},debug={}}</str>
            </lst>
         </lst>
      </lst>
   </lst>
</response>

On Tue, Oct 14, 2014 at 10:32 AM, Tim Potter <tim.pot...@lucidworks.com>
wrote:

> Try adding shards.info=true and debug=track to your queries ... these will
> give more detailed information about what's going behind the scenes.
>
> On Mon, Oct 13, 2014 at 11:11 PM, S.L <simpleliving...@gmail.com> wrote:
>
> > Erick,
> >
> > I have upgraded to SolrCloud 4.10.1 with the same toplogy , 3 shards and
> 2
> > replication factor with six cores altogether.
> >
> > Unfortunately , I still see the issue of intermittently no results being
> > returned.I am not able to figure out whats going on here, I have included
> > the logging information below.
> >
> > *Here's the query that I run.*
> >
> >
> >
> http://server1.mydomain.com:8081/solr/dyCollection1/select/?q=*:*&fq=%28id:220a8dce-3b31-4d46-8386-da8405595c47%29&wt=json&distrib=true
> >
> >
> >
> > *Scenario 1: No result returned.*
> >
> > *Log Information for Scenario #1 .*
> > 92860314 [http-bio-8081-exec-103] INFO
> > org.apache.solr.handler.component.SpellCheckComponent  –
> >
> >
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> > null
> > 92860315 [http-bio-8081-exec-103] INFO
> > org.apache.solr.handler.component.SpellCheckComponent  –
> >
> >
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> > null
> > 92860315 [http-bio-8081-exec-103] INFO
> > org.apache.solr.handler.component.SpellCheckComponent  –
> >
> >
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> > null
> > 92860315 [http-bio-8081-exec-103] INFO  org.apache.solr.core.SolrCore  –
> > [dyCollection1_shard2_replica1] webapp=/solr path=/select/
> >
> >
> params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)}
> > hits=0 status=0 QTime=5
> >
> > *Scenario #2 : I get result back*
> >
> >
> >
> > *Log information for scenario #2.*92881911 [http-bio-8081-exec-177] INFO
> > org.apache.solr.core.SolrCore  – [dyCollection1_shard2_replica1]
> > webapp=/solr path=/select
> >
> >
> params={spellcheck=true&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url=
> >
> >
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&fl=productURL,score&df=suggestAggregate&start=0&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fsv=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5
> > }
> > hits=1 status=0 QTime=1
> > 92881913 [http-bio-8081-exec-177] INFO  org.apache.solr.core.SolrCore  –
> > [dyCollection1_shard2_replica1] webapp=/solr path=/select
> >
> >
> params={spellcheck=false&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&ids=
> >
> >
> http://www.searcheddomain.com/p/ironwork-8-piece-comforter-set/-/A-15273248&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url=http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&df=suggestAggregate&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5
> > }
> > status=0 QTime=0
> > 92881914 [http-bio-8081-exec-169] INFO
> > org.apache.solr.handler.component.SpellCheckComponent  –
> >
> >
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> > null
> > 92881914 [http-bio-8081-exec-169] INFO
> > org.apache.solr.handler.component.SpellCheckComponent  –
> >
> >
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> > null
> > 92881914 [http-bio-8081-exec-169] INFO
> > org.apache.solr.handler.component.SpellCheckComponent  –
> >
> >
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> > null
> > 92881914 [http-bio-8081-exec-169] INFO
> > org.apache.solr.handler.component.SpellCheckComponent  –
> >
> >
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> > null
> > 92881915 [http-bio-8081-exec-169] INFO  org.apache.solr.core.SolrCore  –
> > [dyCollection1_shard2_replica1] webapp=/solr path=/select/
> >
> >
> params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)}
> > hits=1 status=0 QTime=7
> >
> >
> > *Autocommit and Soft commit settings.*
> >
> >      <autoSoftCommit>
> >        <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
> >      </autoSoftCommit>
> >
> >      <autoCommit>
> >        <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
> >
> >        <openSearcher>true</openSearcher>
> >      </autoCommit>
> >
> >
> >
> > On Tue, Oct 7, 2014 at 12:22 AM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> > > Not, I'm not guaranteeing that it'll actually cure the problem, just
> > > that enough has changed since 4.7 that it'd be a good place to start.
> > >
> > > Things have been reported off and on, but they're often pesky race
> > > conditions or something else that takes a long time to track down, you
> > > just are lucky perhaps ;)...
> > >
> > > Erick
> > >
> > > On Mon, Oct 6, 2014 at 8:04 PM, S.L <simpleliving...@gmail.com> wrote:
> > > > Erick,
> > > >
> > > > Thanks for the suggestion , I am not sure if I would be able to
> capture
> > > > what went wrong , so upgrading to 4.10 seems easier even though it
> > means
> > > ,
> > > > a days work of effort :) . I will go ahead and upgrade and let me
> know
> > ,
> > > > although I am surprised that this issue never got reported for 4.7 up
> > > until
> > > > now.
> > > >
> > > > Thanks again for your help!
> > > >
> > > >
> > > >
> > > > On Mon, Oct 6, 2014 at 10:52 PM, Erick Erickson <
> > erickerick...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> I think there were some holes that would allow replicas and leaders
> to
> > > >> be out of synch that have been patched up in the last 3 releases.
> > > >>
> > > >> There shouldn't be anything you need to do to keep these in synch,
> so
> > > >> if you can capture what happened when things got out of synch we'll
> > > >> fix it. But a lot has changed in the last several months, so the
> first
> > > >> thing I'd do if possible is to upgrade to 4.10.1.
> > > >>
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >> On Mon, Oct 6, 2014 at 2:41 PM, S.L <simpleliving...@gmail.com>
> > wrote:
> > > >> > Hi Erick,
> > > >> >
> > > >> > Before I tried your suggestion of  issung a commit=true update, I
> > > >> realized that for eaach shard there was atleast a node that had its
> > > index
> > > >> directory named like index.<timestamp>.
> > > >> >
> > > >> > I went ahead and deleted index directory that restarted that core
> > and
> > > >> now the index directory got syched with the other node and is
> properly
> > > >> named as 'index' without any timestamp attached to it.This is now
> > > giving me
> > > >> consistent results for distrib=true using a load balancer.Also
> > > >> distrib=false returns expexted results for a given shard.
> > > >> >
> > > >> > The underlying issue appears to be that in every shard the leader
> > and
> > > >> the replica(follower) were out of sych.
> > > >> >
> > > >> > How can I avoid this from happening again?
> > > >> >
> > > >> > Thanks for your help!
> > > >> >
> > > >> > Sent from my HTC
> > > >> >
> > > >> > ----- Reply message -----
> > > >> > From: "Erick Erickson" <erickerick...@gmail.com>
> > > >> > To: <solr-user@lucene.apache.org>
> > > >> > Subject: SolrCloud 4.7 not doing distributed search when querying
> > > from a
> > > >> load balancer.
> > > >> > Date: Fri, Oct 3, 2014 12:56 AM
> > > >> >
> > > >> > Hmmmm. Assuming that you aren't re-indexing the doc you're
> searching
> > > >> for...
> > > >> >
> > > >> > Try issuing http://blah
> > blah:8983/solr/collection/update?commit=true.
> > > >> > That'll force all the docs to be searchable. Does <1> still hold
> for
> > > >> > the document in question? Because this is exactly backwards of
> what
> > > >> > I'd expect. I'd expect, if anything, the replica (I'm trying to
> call
> > > >> > it the "follower" when a distinction needs to be made since the
> > leader
> > > >> > is a "replica" too....) would be out of sync. This is still a Bad
> > > >> > Thing, but the leader gets first crack at indexing thing.
> > > >> >
> > > >> > bq: only the replica of the shard that has this key returns the
> > result
> > > >> > , and the leader does not ,
> > > >> >
> > > >> > Just to be sure we're talking about the same thing. When you say
> > > >> > "leader", you mean the shard leader, right? The filled-in circle
> on
> > > >> > the graph view from the admin/cloud page.
> > > >> >
> > > >> > And let's see your soft and hard commit settings please.
> > > >> >
> > > >> > Best,
> > > >> > Erick
> > > >> >
> > > >> > On Thu, Oct 2, 2014 at 9:48 PM, S.L <simpleliving...@gmail.com>
> > > wrote:
> > > >> >> Eirck,
> > > >> >>
> > > >> >> 0> Load balancer is out of the picture
> > > >> >> .
> > > >> >> 1>When I query with *distrib=false* , I get consistent results as
> > > >> expected
> > > >> >> for those shards that dont have the key i.e I dont get the
> results
> > > back
> > > >> for
> > > >> >> those shards, however I just realized that while *distrib=false*
> is
> > > >> present
> > > >> >> in the query for the shard that is supposed to contain the
> key,only
> > > the
> > > >> >> replica of the shard that has this key returns the result , and
> the
> > > >> leader
> > > >> >> does not , looks like replica and the leader do not have the same
> > > data
> > > >> and
> > > >> >> replica seems to contain the key in the query for that shard.
> > > >> >>
> > > >> >> 2> By indexing I mean this collection is being populated by a web
> > > >> crawler.
> > > >> >>
> > > >> >> So looks like 1> above  is pointing to leader and replica being
> out
> > > of
> > > >> >> synch for atleast one shard.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson <
> > > >> erickerick...@gmail.com>
> > > >> >> wrote:
> > > >> >>
> > > >> >>> bq: Also ,the collection is being actively indexed as I query
> > this,
> > > >> could
> > > >> >>> that
> > > >> >>> be an issue too ?
> > > >> >>>
> > > >> >>> Not if the documents you're searching aren't being added as you
> > > search
> > > >> >>> (and all your autocommit intervals have expired).
> > > >> >>>
> > > >> >>> I would turn off indexing for testing, it's just one more
> variable
> > > >> >>> that can get in the way of understanding this.
> > > >> >>>
> > > >> >>> Do note that if the problem were endemic to Solr, there would
> > > probably
> > > >> >>> be a _lot_ more noise out there.
> > > >> >>>
> > > >> >>> So to recap:
> > > >> >>> 0> we can take the load balancer out of the picture all
> together.
> > > >> >>>
> > > >> >>> 1> when you query each shard individually with &distrib=true,
> > every
> > > >> >>> replica in a particular shard returns the same count.
> > > >> >>>
> > > >> >>> 2> when you query without &distrib=true you get varying counts.
> > > >> >>>
> > > >> >>> This is very strange and not at all expected. Let's try it again
> > > >> >>> without indexing going on....
> > > >> >>>
> > > >> >>> And what do you mean by "indexing" anyway? How are documents
> being
> > > fed
> > > >> >>> to your system?
> > > >> >>>
> > > >> >>> Best,
> > > >> >>> Erick@PuzzledAsWell
> > > >> >>>
> > > >> >>> On Thu, Oct 2, 2014 at 7:32 PM, S.L <simpleliving...@gmail.com>
> > > wrote:
> > > >> >>> > Erick,
> > > >> >>> >
> > > >> >>> > I would like to add that the interesting behavior i.e point #2
> > > that I
> > > >> >>> > mentioned in my earlier reply  happens in all the shards , if
> > this
> > > >> were
> > > >> >>> to
> > > >> >>> > be a distributed search issue this should have not manifested
> > > itself
> > > >> in
> > > >> >>> the
> > > >> >>> > shard that contains the key that I am searching for , looks
> like
> > > the
> > > >> >>> search
> > > >> >>> > is just failing as whole intermittently .
> > > >> >>> >
> > > >> >>> > Also ,the collection is being actively indexed as I query
> this,
> > > could
> > > >> >>> that
> > > >> >>> > be an issue too ?
> > > >> >>> >
> > > >> >>> > Thanks.
> > > >> >>> >
> > > >> >>> > On Thu, Oct 2, 2014 at 10:24 PM, S.L <
> simpleliving...@gmail.com
> > >
> > > >> wrote:
> > > >> >>> >
> > > >> >>> >> Erick,
> > > >> >>> >>
> > > >> >>> >> Thanks for your reply, I tried your suggestions.
> > > >> >>> >>
> > > >> >>> >> 1 . When not using loadbalancer if  *I have distrib=false* I
> > get
> > > >> >>> >> consistent results across the replicas.
> > > >> >>> >>
> > > >> >>> >> 2. However here's the insteresting part , while not using
> load
> > > >> balancer
> > > >> >>> if
> > > >> >>> >> I *dont have distrib=false* , then when I query a particular
> > node
> > > >> ,I get
> > > >> >>> >> the same behaviour as if I were using a loadbalancer ,
> meaning
> > > the
> > > >> >>> >> distributed search from a node works intermittently .Does
> this
> > > give
> > > >> any
> > > >> >>> >> clue ?
> > > >> >>> >>
> > > >> >>> >>
> > > >> >>> >>
> > > >> >>> >> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson <
> > > >> erickerick...@gmail.com
> > > >> >>> >
> > > >> >>> >> wrote:
> > > >> >>> >>
> > > >> >>> >>> Hmmm, nothing quite makes sense here....
> > > >> >>> >>>
> > > >> >>> >>> Here are some experiments:
> > > >> >>> >>> 1> avoid the load balancer and issue queries like
> > > >> >>> >>>
> > > http://solr_server:8983/solr/collection/q=whatever&distrib=false
> > > >> >>> >>>
> > > >> >>> >>> the &distrib=false bit will cause keep SolrCloud from trying
> > to
> > > >> send
> > > >> >>> >>> the queries anywhere, they'll be served only from the node
> you
> > > >> address
> > > >> >>> >>> them to.
> > > >> >>> >>> that'll help check whether the nodes are consistent. You
> > should
> > > be
> > > >> >>> >>> getting back the same results from each replica in a shard
> > > (i.e. 2
> > > >> of
> > > >> >>> >>> your 6 machines).
> > > >> >>> >>>
> > > >> >>> >>> Next, try your failing query the same way.
> > > >> >>> >>>
> > > >> >>> >>> Next, try your failing query from a browser, pointing it at
> > > >> successive
> > > >> >>> >>> nodes.
> > > >> >>> >>>
> > > >> >>> >>> Where is the first place problems show up?
> > > >> >>> >>>
> > > >> >>> >>> My _guess_ is that your load balancer isn't quite doing what
> > you
> > > >> >>> think, or
> > > >> >>> >>> your cluster isn't set up the way you think it is, but those
> > are
> > > >> >>> guesses.
> > > >> >>> >>>
> > > >> >>> >>> Best,
> > > >> >>> >>> Erick
> > > >> >>> >>>
> > > >> >>> >>> On Thu, Oct 2, 2014 at 2:51 PM, S.L <
> > simpleliving...@gmail.com>
> > > >> wrote:
> > > >> >>> >>> > Hi All,
> > > >> >>> >>> >
> > > >> >>> >>> > I am trying to query a 6 node Solr4.7  cluster with 3
> shards
> > > >> and  a
> > > >> >>> >>> > replication factor of 2 .
> > > >> >>> >>> >
> > > >> >>> >>> > I have fronted these 6 Solr nodes using a load balancer ,
> > > what I
> > > >> >>> notice
> > > >> >>> >>> is
> > > >> >>> >>> > that every time I do a search of the form
> > > >> >>> >>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it
> gives
> > > me a
> > > >> >>> result
> > > >> >>> >>> > only once in every 3 tries , telling me that the load
> > > balancer is
> > > >> >>> >>> > distributing the requests between the 3 shards and
> SolrCloud
> > > only
> > > >> >>> >>> returns a
> > > >> >>> >>> > result if the request goes to the core that as that id .
> > > >> >>> >>> >
> > > >> >>> >>> > However if I do a simple search like q=*:* , I
> consistently
> > > get
> > > >> the
> > > >> >>> >>> right
> > > >> >>> >>> > aggregated results back of all the documents across all
> the
> > > >> shards
> > > >> >>> for
> > > >> >>> >>> > every request from the load balancer. Can someone please
> let
> > > me
> > > >> know
> > > >> >>> >>> what
> > > >> >>> >>> > this is symptomatic of ?
> > > >> >>> >>> >
> > > >> >>> >>> > Somehow Solr Cloud seems to be doing search query
> > distribution
> > > >> and
> > > >> >>> >>> > aggregation for queries of type *:* only.
> > > >> >>> >>> >
> > > >> >>> >>> > Thanks.
> > > >> >>> >>>
> > > >> >>> >>
> > > >> >>> >>
> > > >> >>>
> > > >>
> > >
> >
>

Reply via email to