On 10/16/2014 6:27 PM, S.L wrote:
1. Java Version :java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

I believe that build 51 is one of those that is known to have bugs related to Lucene. If you can upgrade this to 67, that would be good, but I don't know that it's a pressing matter. It looks like the Oracle JVM, which is good.

2.OS
CentOS Linux release 7.0.1406 (Core)

3. Everything is 64 bit , OS , Java , and CPU.

4. Java Args.
     -Djava.io.tmpdir=/opt/tomcat1/temp
     -Dcatalina.home=/opt/tomcat1
     -Dcatalina.base=/opt/tomcat1
     -Djava.endorsed.dirs=/opt/tomcat1/endorsed
     -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181,
server3.mydomain.com:2181
     -DzkClientTimeout=20000
     -DhostContext=solr
     -Dport=8081
     -Dhost=server1.mydomain.com
     -Dsolr.solr.home=/opt/solr/home1
     -Dfile.encoding=UTF8
     -Duser.timezone=UTC
     -XX:+UseG1GC
     -XX:MaxPermSize=128m
     -XX:PermSize=64m
     -Xmx2048m
     -Xms128m
     -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
     -Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties

I would not use the G1 collector myself, but with the heap at only 2GB, I don't know that it matters all that much. Even a worst-case collection probably is not going to take more than a few seconds, and you've already increased the zookeeper client timeout.

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

5. Zookeeper ensemble has 3 zookeeper instances , which are external and
are not embedded.


6. Container : I am using Tomcat Apache Tomcat Version 7.0.42

*Additional Observations:*

I queries all docs on both replicas with distrib=false&fl=id&sort=id+asc,
then compared the two lists, I could see by eyeballing the first few lines
of ids in both the lists ,I could say that even though each list has equal
number of documents i.e 96309 each , but the document ids in them seem to
be *mutually exclusive* ,  , I did not find even a single  common id in
those lists , I tried at least 15 manually ,it looks like to me that the
replicas are disjoint sets.

Are you sure you hit both replicas of the same shard number? If you are, then it sounds like something is going wrong with your document routing, or maybe your clusterstate is really messed up. Recreating the collection from scratch and doing a full reindex might be a good plan ... assuming this is possible for you. You could create a whole new collection, and then when you're ready to switch, delete the original collection and create an alias so your app can still use the old name.

How much total RAM do you have on these systems, and how large are those index shards? With a shard having 96K documents, it sounds like your whole index is probably just shy of 300K documents.

Thanks,
Shawn

Reply via email to