Look at the logging information I provided below , looks like the results
are only being returned back for this solrCloud cluster  if the request
goes to one of the two replicas of a shard.

I have verified that numDocs in the replicas for a given shard is same but
there is difference in the maxDoc and deletedDocs, does this signal the
replicas being out of sync ?

Even if the numDocs are same , how do we guarantee that those docs are
identical and have the same uniquekeys , is there a way to verify this ? I
am suspecting that  as the numDocs is same across the replicas , and still
only when the request goes to one of  the  replicas of the shard that I get
a result back , the documents with in those replicas with in a shard are
not an exact replica set of each other.

I suspect the issue I am facing in 4.10.1 cloud is related to
https://issues.apache.org/jira/browse/SOLR-4924 .

Can anyone please let me know , how to solve this issue of intermittent no
results for a query ?



On Wed, Oct 15, 2014 at 3:15 PM, S.L <simpleliving...@gmail.com> wrote:

> Tim,
>
> Thanks for the suggestion.
>
> I have rerun the query by adding shards.info=true and debug= track. I
> have included the xml data for both teh scenarios below , thin happens
> intermittently on SolrCloud 4.10.1 , with a replication factor of 2 and 3
> shards (6 cores) , I get result in one execution of query and then no
> results for the subsequent one , I am hoping someone would be able to help
> me find the root cause with this additional information ,I have included
> the query output with the additional parameters for the both the scenarios
> below .
>
> Thanks for your help!
>
> *Scenario #1 : In this try I get no results back. Here is what the query
> returns.*
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>    <lst name="responseHeader">
>       <int name="status">0</int>
>       <int name="QTime">29</int>
>       <lst name="params">
>          <str name="q">*:*</str>
>          <str name="shards.info">true</str>
>          <str name="distrib">true</str>
>          <str name="debug">track</str>
>          <str name="wt">xml</str>
>          <str name="fq">(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb)</str>
>       </lst>
>    </lst>
>    <lst name="shards.info">
>       <lst name="
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> ">
>          <long name="numFound">0</long>
>          <float name="maxScore">0.0</float>
>          <str name="shardAddress">
> http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1</str>
>          <long name="time">4</long>
>       </lst>
>       <lst name="
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> ">
>          <long name="numFound">0</long>
>          <float name="maxScore">0.0</float>
>          <str name="shardAddress">
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1</str>
>          <long name="time">13</long>
>       </lst>
>       <lst name="
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> ">
>          <long name="numFound">0</long>
>          <float name="maxScore">0.0</float>
>          <str name="shardAddress">
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1</str>
>          <long name="time">26</long>
>       </lst>
>    </lst>
>    <result name="response" numFound="0" start="0" maxScore="0.0" />
>    <lst name="spellcheck">
>       <lst name="suggestions">
>          <bool name="correctlySpelled">false</bool>
>       </lst>
>    </lst>
>    <lst name="debug">
>       <lst name="track">
>          <str
> name="rid">server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17</str>
>          <lst name="EXECUTE_QUERY">
>             <lst name="
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> ">
>                <str name="QTime">1</str>
>                <str name="ElapsedTime">4</str>
>                <str name="RequestPurpose">GET_TOP_IDS</str>
>                <str name="NumFound">0</str>
>                <str
> name="Response">{responseHeader={status=0,QTime=1,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
> track],version=2,NOW=1413398784225,shard.url=
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
> wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}</str>
>             </lst>
>             <lst name="
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> ">
>                <str name="QTime">10</str>
>                <str name="ElapsedTime">13</str>
>                <str name="RequestPurpose">GET_TOP_IDS</str>
>                <str name="NumFound">0</str>
>                <str
> name="Response">{responseHeader={status=0,QTime=10,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
> track],version=2,NOW=1413398784225,shard.url=
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
> wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}</str>
>             </lst>
>             <lst name="
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> ">
>                <str name="QTime">24</str>
>                <str name="ElapsedTime">26</str>
>                <str name="RequestPurpose">GET_TOP_IDS</str>
>                <str name="NumFound">0</str>
>                <str
> name="Response">{responseHeader={status=0,QTime=24,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
> track],version=2,NOW=1413398784225,shard.url=
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
> wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}</str>
>             </lst>
>          </lst>
>       </lst>
>    </lst>
> </response>
>
> *Scenario #2 : In this try I get results back. Here is what the query
> returns.*
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>    <lst name="responseHeader">
>       <int name="status">0</int>
>       <int name="QTime">35</int>
>       <lst name="params">
>          <str name="q">*:*</str>
>          <str name="shards.info">true</str>
>          <str name="distrib">true</str>
>          <str name="debug">track</str>
>          <str name="wt">xml</str>
>          <str name="fq">(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb)</str>
>       </lst>
>    </lst>
>    <lst name="shards.info">
>       <lst name="
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> ">
>          <long name="numFound">0</long>
>          <float name="maxScore">0.0</float>
>          <str name="shardAddress">
> http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2</str>
>          <long name="time">4</long>
>       </lst>
>       <lst name="
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> ">
>          <long name="numFound">1</long>
>          <float name="maxScore">1.0</float>
>          <str name="shardAddress">
> http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2</str>
>          <long name="time">17</long>
>       </lst>
>       <lst name="
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> ">
>          <long name="numFound">0</long>
>          <float name="maxScore">0.0</float>
>          <str name="shardAddress">
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2</str>
>          <long name="time">29</long>
>       </lst>
>    </lst>
>    <result name="response" numFound="1" start="0" maxScore="1.0">
>       <doc>
>          <str name="thingURL">http://www.redacted.com/ip/Cutter-Bite</str>
>
>          <str name="id">e8995da8-7d98-4010-93b4-8ff7dffb8bfb</str>
>          <long name="_version_">1481991045188157440</long>
>       </doc>
>    </result>
>    <lst name="spellcheck">
>       <lst name="suggestions">
>          <bool name="correctlySpelled">false</bool>
>       </lst>
>    </lst>
>    <lst name="debug">
>       <lst name="track">
>          <str
> name="rid">server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16</str>
>          <lst name="EXECUTE_QUERY">
>             <lst name="
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> ">
>                <str name="QTime">2</str>
>                <str name="ElapsedTime">4</str>
>                <str name="RequestPurpose">GET_TOP_IDS</str>
>                <str name="NumFound">0</str>
>                <str
> name="Response">{responseHeader={status=0,QTime=2,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
> track],version=2,NOW=1413398738457,shard.url=
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
> wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}</str>
>             </lst>
>             <lst name="
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> ">
>                <str name="QTime">14</str>
>                <str name="ElapsedTime">17</str>
>                <str name="RequestPurpose">GET_TOP_IDS</str>
>                <str name="NumFound">1</str>
>                <str
> name="Response">{responseHeader={status=0,QTime=14,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
> track],version=2,NOW=1413398738457,shard.url=
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
> wordbreak],isShard=true}},response={numFound=1,start=0,maxScore=1.0,docs=[SolrDocument{thingURL=
> http://www.redacted.com/ip/Cutter-Bite-MD-Insect-Bite-Relief-.5-fl-oz/12166875,
> score=1.0}]},sort_values={},debug={}}</str>
>             </lst>
>             <lst name="
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> ">
>                <str name="QTime">26</str>
>                <str name="ElapsedTime">29</str>
>                <str name="RequestPurpose">GET_TOP_IDS</str>
>                <str name="NumFound">0</str>
>                <str
> name="Response">{responseHeader={status=0,QTime=26,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
> track],version=2,NOW=1413398738457,shard.url=
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
> wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}</str>
>             </lst>
>          </lst>
>          <lst name="GET_FIELDS">
>             <lst name="
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> ">
>                <str name="QTime">1</str>
>                <str name="ElapsedTime">3</str>
>                <str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
>                <str name="NumFound">1</str>
>                <str
> name="Response">{responseHeader={status=0,QTime=1,params={spellcheck=false,spellcheck.maxCollationTries=10,distrib=false,debug=[track,
> track],version=2,df=suggestAggregate,shard.url=
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/,NOW=1413398738457,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,ids=http://www.redacted.com/ip/Cutter-Bite,spellcheck.collate=true,wt=javabin,requestPurpose=GET_FIELDS,GET_DEBUG,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
> wordbreak],isShard=true}},response={numFound=1,start=0,docs=[SolrDocument{thingURL=
> http://www.redacted.com/ip/Cutter-Bite,
> id=e8995da8-7d98-4010-93b4-8ff7dffb8bfb,
> _version_=1481991045188157440}]},debug={}}</str>
>             </lst>
>          </lst>
>       </lst>
>    </lst>
> </response>
>
> On Tue, Oct 14, 2014 at 10:32 AM, Tim Potter <tim.pot...@lucidworks.com>
> wrote:
>
>> Try adding shards.info=true and debug=track to your queries ... these
>> will
>> give more detailed information about what's going behind the scenes.
>>
>> On Mon, Oct 13, 2014 at 11:11 PM, S.L <simpleliving...@gmail.com> wrote:
>>
>> > Erick,
>> >
>> > I have upgraded to SolrCloud 4.10.1 with the same toplogy , 3 shards
>> and 2
>> > replication factor with six cores altogether.
>> >
>> > Unfortunately , I still see the issue of intermittently no results being
>> > returned.I am not able to figure out whats going on here, I have
>> included
>> > the logging information below.
>> >
>> > *Here's the query that I run.*
>> >
>> >
>> >
>> http://server1.mydomain.com:8081/solr/dyCollection1/select/?q=*:*&fq=%28id:220a8dce-3b31-4d46-8386-da8405595c47%29&wt=json&distrib=true
>> >
>> >
>> >
>> > *Scenario 1: No result returned.*
>> >
>> > *Log Information for Scenario #1 .*
>> > 92860314 [http-bio-8081-exec-103] INFO
>> > org.apache.solr.handler.component.SpellCheckComponent  –
>> >
>> >
>> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
>> > null
>> > 92860315 [http-bio-8081-exec-103] INFO
>> > org.apache.solr.handler.component.SpellCheckComponent  –
>> >
>> >
>> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
>> > null
>> > 92860315 [http-bio-8081-exec-103] INFO
>> > org.apache.solr.handler.component.SpellCheckComponent  –
>> >
>> >
>> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
>> > null
>> > 92860315 [http-bio-8081-exec-103] INFO  org.apache.solr.core.SolrCore  –
>> > [dyCollection1_shard2_replica1] webapp=/solr path=/select/
>> >
>> >
>> params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)}
>> > hits=0 status=0 QTime=5
>> >
>> > *Scenario #2 : I get result back*
>> >
>> >
>> >
>> > *Log information for scenario #2.*92881911 [http-bio-8081-exec-177] INFO
>> > org.apache.solr.core.SolrCore  – [dyCollection1_shard2_replica1]
>> > webapp=/solr path=/select
>> >
>> >
>> params={spellcheck=true&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url=
>> >
>> >
>> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&fl=productURL,score&df=suggestAggregate&start=0&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fsv=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5
>> > }
>> > hits=1 status=0 QTime=1
>> > 92881913 [http-bio-8081-exec-177] INFO  org.apache.solr.core.SolrCore  –
>> > [dyCollection1_shard2_replica1] webapp=/solr path=/select
>> >
>> >
>> params={spellcheck=false&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&ids=
>> >
>> >
>> http://www.searcheddomain.com/p/ironwork-8-piece-comforter-set/-/A-15273248&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url=http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&df=suggestAggregate&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5
>> > }
>> > status=0 QTime=0
>> > 92881914 [http-bio-8081-exec-169] INFO
>> > org.apache.solr.handler.component.SpellCheckComponent  –
>> >
>> >
>> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
>> > null
>> > 92881914 [http-bio-8081-exec-169] INFO
>> > org.apache.solr.handler.component.SpellCheckComponent  –
>> >
>> >
>> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
>> > null
>> > 92881914 [http-bio-8081-exec-169] INFO
>> > org.apache.solr.handler.component.SpellCheckComponent  –
>> >
>> >
>> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
>> > null
>> > 92881914 [http-bio-8081-exec-169] INFO
>> > org.apache.solr.handler.component.SpellCheckComponent  –
>> >
>> >
>> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
>> > null
>> > 92881915 [http-bio-8081-exec-169] INFO  org.apache.solr.core.SolrCore  –
>> > [dyCollection1_shard2_replica1] webapp=/solr path=/select/
>> >
>> >
>> params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)}
>> > hits=1 status=0 QTime=7
>> >
>> >
>> > *Autocommit and Soft commit settings.*
>> >
>> >      <autoSoftCommit>
>> >        <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>> >      </autoSoftCommit>
>> >
>> >      <autoCommit>
>> >        <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
>> >
>> >        <openSearcher>true</openSearcher>
>> >      </autoCommit>
>> >
>> >
>> >
>> > On Tue, Oct 7, 2014 at 12:22 AM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> > > Not, I'm not guaranteeing that it'll actually cure the problem, just
>> > > that enough has changed since 4.7 that it'd be a good place to start.
>> > >
>> > > Things have been reported off and on, but they're often pesky race
>> > > conditions or something else that takes a long time to track down, you
>> > > just are lucky perhaps ;)...
>> > >
>> > > Erick
>> > >
>> > > On Mon, Oct 6, 2014 at 8:04 PM, S.L <simpleliving...@gmail.com>
>> wrote:
>> > > > Erick,
>> > > >
>> > > > Thanks for the suggestion , I am not sure if I would be able to
>> capture
>> > > > what went wrong , so upgrading to 4.10 seems easier even though it
>> > means
>> > > ,
>> > > > a days work of effort :) . I will go ahead and upgrade and let me
>> know
>> > ,
>> > > > although I am surprised that this issue never got reported for 4.7
>> up
>> > > until
>> > > > now.
>> > > >
>> > > > Thanks again for your help!
>> > > >
>> > > >
>> > > >
>> > > > On Mon, Oct 6, 2014 at 10:52 PM, Erick Erickson <
>> > erickerick...@gmail.com
>> > > >
>> > > > wrote:
>> > > >
>> > > >> I think there were some holes that would allow replicas and
>> leaders to
>> > > >> be out of synch that have been patched up in the last 3 releases.
>> > > >>
>> > > >> There shouldn't be anything you need to do to keep these in synch,
>> so
>> > > >> if you can capture what happened when things got out of synch we'll
>> > > >> fix it. But a lot has changed in the last several months, so the
>> first
>> > > >> thing I'd do if possible is to upgrade to 4.10.1.
>> > > >>
>> > > >>
>> > > >> Best,
>> > > >> Erick
>> > > >>
>> > > >> On Mon, Oct 6, 2014 at 2:41 PM, S.L <simpleliving...@gmail.com>
>> > wrote:
>> > > >> > Hi Erick,
>> > > >> >
>> > > >> > Before I tried your suggestion of  issung a commit=true update, I
>> > > >> realized that for eaach shard there was atleast a node that had its
>> > > index
>> > > >> directory named like index.<timestamp>.
>> > > >> >
>> > > >> > I went ahead and deleted index directory that restarted that core
>> > and
>> > > >> now the index directory got syched with the other node and is
>> properly
>> > > >> named as 'index' without any timestamp attached to it.This is now
>> > > giving me
>> > > >> consistent results for distrib=true using a load balancer.Also
>> > > >> distrib=false returns expexted results for a given shard.
>> > > >> >
>> > > >> > The underlying issue appears to be that in every shard the leader
>> > and
>> > > >> the replica(follower) were out of sych.
>> > > >> >
>> > > >> > How can I avoid this from happening again?
>> > > >> >
>> > > >> > Thanks for your help!
>> > > >> >
>> > > >> > Sent from my HTC
>> > > >> >
>> > > >> > ----- Reply message -----
>> > > >> > From: "Erick Erickson" <erickerick...@gmail.com>
>> > > >> > To: <solr-user@lucene.apache.org>
>> > > >> > Subject: SolrCloud 4.7 not doing distributed search when querying
>> > > from a
>> > > >> load balancer.
>> > > >> > Date: Fri, Oct 3, 2014 12:56 AM
>> > > >> >
>> > > >> > Hmmmm. Assuming that you aren't re-indexing the doc you're
>> searching
>> > > >> for...
>> > > >> >
>> > > >> > Try issuing http://blah
>> > blah:8983/solr/collection/update?commit=true.
>> > > >> > That'll force all the docs to be searchable. Does <1> still hold
>> for
>> > > >> > the document in question? Because this is exactly backwards of
>> what
>> > > >> > I'd expect. I'd expect, if anything, the replica (I'm trying to
>> call
>> > > >> > it the "follower" when a distinction needs to be made since the
>> > leader
>> > > >> > is a "replica" too....) would be out of sync. This is still a Bad
>> > > >> > Thing, but the leader gets first crack at indexing thing.
>> > > >> >
>> > > >> > bq: only the replica of the shard that has this key returns the
>> > result
>> > > >> > , and the leader does not ,
>> > > >> >
>> > > >> > Just to be sure we're talking about the same thing. When you say
>> > > >> > "leader", you mean the shard leader, right? The filled-in circle
>> on
>> > > >> > the graph view from the admin/cloud page.
>> > > >> >
>> > > >> > And let's see your soft and hard commit settings please.
>> > > >> >
>> > > >> > Best,
>> > > >> > Erick
>> > > >> >
>> > > >> > On Thu, Oct 2, 2014 at 9:48 PM, S.L <simpleliving...@gmail.com>
>> > > wrote:
>> > > >> >> Eirck,
>> > > >> >>
>> > > >> >> 0> Load balancer is out of the picture
>> > > >> >> .
>> > > >> >> 1>When I query with *distrib=false* , I get consistent results
>> as
>> > > >> expected
>> > > >> >> for those shards that dont have the key i.e I dont get the
>> results
>> > > back
>> > > >> for
>> > > >> >> those shards, however I just realized that while
>> *distrib=false* is
>> > > >> present
>> > > >> >> in the query for the shard that is supposed to contain the
>> key,only
>> > > the
>> > > >> >> replica of the shard that has this key returns the result , and
>> the
>> > > >> leader
>> > > >> >> does not , looks like replica and the leader do not have the
>> same
>> > > data
>> > > >> and
>> > > >> >> replica seems to contain the key in the query for that shard.
>> > > >> >>
>> > > >> >> 2> By indexing I mean this collection is being populated by a
>> web
>> > > >> crawler.
>> > > >> >>
>> > > >> >> So looks like 1> above  is pointing to leader and replica being
>> out
>> > > of
>> > > >> >> synch for atleast one shard.
>> > > >> >>
>> > > >> >>
>> > > >> >>
>> > > >> >> On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson <
>> > > >> erickerick...@gmail.com>
>> > > >> >> wrote:
>> > > >> >>
>> > > >> >>> bq: Also ,the collection is being actively indexed as I query
>> > this,
>> > > >> could
>> > > >> >>> that
>> > > >> >>> be an issue too ?
>> > > >> >>>
>> > > >> >>> Not if the documents you're searching aren't being added as you
>> > > search
>> > > >> >>> (and all your autocommit intervals have expired).
>> > > >> >>>
>> > > >> >>> I would turn off indexing for testing, it's just one more
>> variable
>> > > >> >>> that can get in the way of understanding this.
>> > > >> >>>
>> > > >> >>> Do note that if the problem were endemic to Solr, there would
>> > > probably
>> > > >> >>> be a _lot_ more noise out there.
>> > > >> >>>
>> > > >> >>> So to recap:
>> > > >> >>> 0> we can take the load balancer out of the picture all
>> together.
>> > > >> >>>
>> > > >> >>> 1> when you query each shard individually with &distrib=true,
>> > every
>> > > >> >>> replica in a particular shard returns the same count.
>> > > >> >>>
>> > > >> >>> 2> when you query without &distrib=true you get varying counts.
>> > > >> >>>
>> > > >> >>> This is very strange and not at all expected. Let's try it
>> again
>> > > >> >>> without indexing going on....
>> > > >> >>>
>> > > >> >>> And what do you mean by "indexing" anyway? How are documents
>> being
>> > > fed
>> > > >> >>> to your system?
>> > > >> >>>
>> > > >> >>> Best,
>> > > >> >>> Erick@PuzzledAsWell
>> > > >> >>>
>> > > >> >>> On Thu, Oct 2, 2014 at 7:32 PM, S.L <simpleliving...@gmail.com
>> >
>> > > wrote:
>> > > >> >>> > Erick,
>> > > >> >>> >
>> > > >> >>> > I would like to add that the interesting behavior i.e point
>> #2
>> > > that I
>> > > >> >>> > mentioned in my earlier reply  happens in all the shards , if
>> > this
>> > > >> were
>> > > >> >>> to
>> > > >> >>> > be a distributed search issue this should have not manifested
>> > > itself
>> > > >> in
>> > > >> >>> the
>> > > >> >>> > shard that contains the key that I am searching for , looks
>> like
>> > > the
>> > > >> >>> search
>> > > >> >>> > is just failing as whole intermittently .
>> > > >> >>> >
>> > > >> >>> > Also ,the collection is being actively indexed as I query
>> this,
>> > > could
>> > > >> >>> that
>> > > >> >>> > be an issue too ?
>> > > >> >>> >
>> > > >> >>> > Thanks.
>> > > >> >>> >
>> > > >> >>> > On Thu, Oct 2, 2014 at 10:24 PM, S.L <
>> simpleliving...@gmail.com
>> > >
>> > > >> wrote:
>> > > >> >>> >
>> > > >> >>> >> Erick,
>> > > >> >>> >>
>> > > >> >>> >> Thanks for your reply, I tried your suggestions.
>> > > >> >>> >>
>> > > >> >>> >> 1 . When not using loadbalancer if  *I have distrib=false* I
>> > get
>> > > >> >>> >> consistent results across the replicas.
>> > > >> >>> >>
>> > > >> >>> >> 2. However here's the insteresting part , while not using
>> load
>> > > >> balancer
>> > > >> >>> if
>> > > >> >>> >> I *dont have distrib=false* , then when I query a particular
>> > node
>> > > >> ,I get
>> > > >> >>> >> the same behaviour as if I were using a loadbalancer ,
>> meaning
>> > > the
>> > > >> >>> >> distributed search from a node works intermittently .Does
>> this
>> > > give
>> > > >> any
>> > > >> >>> >> clue ?
>> > > >> >>> >>
>> > > >> >>> >>
>> > > >> >>> >>
>> > > >> >>> >> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson <
>> > > >> erickerick...@gmail.com
>> > > >> >>> >
>> > > >> >>> >> wrote:
>> > > >> >>> >>
>> > > >> >>> >>> Hmmm, nothing quite makes sense here....
>> > > >> >>> >>>
>> > > >> >>> >>> Here are some experiments:
>> > > >> >>> >>> 1> avoid the load balancer and issue queries like
>> > > >> >>> >>>
>> > > http://solr_server:8983/solr/collection/q=whatever&distrib=false
>> > > >> >>> >>>
>> > > >> >>> >>> the &distrib=false bit will cause keep SolrCloud from
>> trying
>> > to
>> > > >> send
>> > > >> >>> >>> the queries anywhere, they'll be served only from the node
>> you
>> > > >> address
>> > > >> >>> >>> them to.
>> > > >> >>> >>> that'll help check whether the nodes are consistent. You
>> > should
>> > > be
>> > > >> >>> >>> getting back the same results from each replica in a shard
>> > > (i.e. 2
>> > > >> of
>> > > >> >>> >>> your 6 machines).
>> > > >> >>> >>>
>> > > >> >>> >>> Next, try your failing query the same way.
>> > > >> >>> >>>
>> > > >> >>> >>> Next, try your failing query from a browser, pointing it at
>> > > >> successive
>> > > >> >>> >>> nodes.
>> > > >> >>> >>>
>> > > >> >>> >>> Where is the first place problems show up?
>> > > >> >>> >>>
>> > > >> >>> >>> My _guess_ is that your load balancer isn't quite doing
>> what
>> > you
>> > > >> >>> think, or
>> > > >> >>> >>> your cluster isn't set up the way you think it is, but
>> those
>> > are
>> > > >> >>> guesses.
>> > > >> >>> >>>
>> > > >> >>> >>> Best,
>> > > >> >>> >>> Erick
>> > > >> >>> >>>
>> > > >> >>> >>> On Thu, Oct 2, 2014 at 2:51 PM, S.L <
>> > simpleliving...@gmail.com>
>> > > >> wrote:
>> > > >> >>> >>> > Hi All,
>> > > >> >>> >>> >
>> > > >> >>> >>> > I am trying to query a 6 node Solr4.7  cluster with 3
>> shards
>> > > >> and  a
>> > > >> >>> >>> > replication factor of 2 .
>> > > >> >>> >>> >
>> > > >> >>> >>> > I have fronted these 6 Solr nodes using a load balancer ,
>> > > what I
>> > > >> >>> notice
>> > > >> >>> >>> is
>> > > >> >>> >>> > that every time I do a search of the form
>> > > >> >>> >>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it
>> gives
>> > > me a
>> > > >> >>> result
>> > > >> >>> >>> > only once in every 3 tries , telling me that the load
>> > > balancer is
>> > > >> >>> >>> > distributing the requests between the 3 shards and
>> SolrCloud
>> > > only
>> > > >> >>> >>> returns a
>> > > >> >>> >>> > result if the request goes to the core that as that id .
>> > > >> >>> >>> >
>> > > >> >>> >>> > However if I do a simple search like q=*:* , I
>> consistently
>> > > get
>> > > >> the
>> > > >> >>> >>> right
>> > > >> >>> >>> > aggregated results back of all the documents across all
>> the
>> > > >> shards
>> > > >> >>> for
>> > > >> >>> >>> > every request from the load balancer. Can someone please
>> let
>> > > me
>> > > >> know
>> > > >> >>> >>> what
>> > > >> >>> >>> > this is symptomatic of ?
>> > > >> >>> >>> >
>> > > >> >>> >>> > Somehow Solr Cloud seems to be doing search query
>> > distribution
>> > > >> and
>> > > >> >>> >>> > aggregation for queries of type *:* only.
>> > > >> >>> >>> >
>> > > >> >>> >>> > Thanks.
>> > > >> >>> >>>
>> > > >> >>> >>
>> > > >> >>> >>
>> > > >> >>>
>> > > >>
>> > >
>> >
>>
>
>

Reply via email to