Shawn, Please find the answers to your questions.
1. Java Version :java version "1.7.0_51" Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) 2.OS CentOS Linux release 7.0.1406 (Core) 3. Everything is 64 bit , OS , Java , and CPU. 4. Java Args. -Djava.io.tmpdir=/opt/tomcat1/temp -Dcatalina.home=/opt/tomcat1 -Dcatalina.base=/opt/tomcat1 -Djava.endorsed.dirs=/opt/tomcat1/endorsed -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181, server3.mydomain.com:2181 -DzkClientTimeout=20000 -DhostContext=solr -Dport=8081 -Dhost=server1.mydomain.com -Dsolr.solr.home=/opt/solr/home1 -Dfile.encoding=UTF8 -Duser.timezone=UTC -XX:+UseG1GC -XX:MaxPermSize=128m -XX:PermSize=64m -Xmx2048m -Xms128m -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties 5. Zookeeper ensemble has 3 zookeeper instances , which are external and are not embedded. 6. Container : I am using Tomcat Apache Tomcat Version 7.0.42 *Additional Observations:* I queries all docs on both replicas with distrib=false&fl=id&sort=id+asc, then compared the two lists, I could see by eyeballing the first few lines of ids in both the lists ,I could say that even though each list has equal number of documents i.e 96309 each , but the document ids in them seem to be *mutually exclusive* , , I did not find even a single common id in those lists , I tried at least 15 manually ,it looks like to me that the replicas are disjoint sets. Thanks. On Thu, Oct 16, 2014 at 1:41 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 10/15/2014 10:24 PM, S.L wrote: > >> Yes , I tried those two queries with distrib=false , I get 0 results for >> first and 1 result for the second query( (i.e. server 3 shard 2 replica >> 2) consistently. >> >> However if I run the same second query (i.e. server 3 shard 2 replica 2) >> with distrib=true, I sometimes get a result and sometimes not , should'nt >> this query always return a result when its pointing to a core that seems >> to >> have that document regardless of distrib=true or false ? >> >> Unfortunately I dont see anything particular in the logs to point to any >> information. >> >> BTW you asked me to replace the request handler , I use the select request >> handler ,so I cannot replace it with anything else , is that a problem ? >> > > If you send the query with distrib=true (which is the default value in > SolrCloud), then it treats it just as if you had sent it to > /solr/collection instead of /solr/collection_shardN_replicaN, so it's a > full distributed query. The distrib=false is required to turn that behavior > off and ONLY query the index on the actual core where you sent it. > > I only said to replace those things as appropriate. Since you are using > /select, it's no problem that you left it that way. If I were to assume > that you used /select, but you didn't, the URLs as I wrote them might not > have worked. > > As discussed, this means that your replicas are truly out of sync. It's > difficult to know what caused it, especially if you can't see anything in > the log when you indexed the missing documents. > > We know you're on Solr 4.10.1. This means that your Java is a 1.7 > version, since Java7 is required. > > Here's where I ask a whole lot of questions about your setup. What is the > precise Java version, and which vendor's Java are you using? What > operating system is it on? Is everything 64-bit, or is any piece (CPU, OS, > Java) 32-bit? On the Solr admin UI dashboard, it lists all parameters used > when starting Java, labelled as "Args". Can you include those? Is > zookeeper external, or embedded in Solr? Is it a 3-server (or more) > ensemble? Are you using the example jetty, or did you provide your own > servlet container? > > We recommend 64-bit Oracle Java, the latest 1.7 version. OpenJDK (since > version 1.7.x) should be pretty safe as well, but IBM's Java should be > avoided. IBM does very aggressive runtime optimizations. These can make > programs run faster, but they are known to negatively affect Lucene/Solr. > > Thanks, > Shawn > >