Shawn, Just wondering if you have any other suggestions on what the next steps whould be ? Thanks.
On Thu, Oct 16, 2014 at 11:12 PM, S.L <simpleliving...@gmail.com> wrote: > Shawn , > > > 1. I will upgrade to 67 JVM shortly . > 2. This is a new collection as , I was facing a similar issue in 4.7 > and based on Erick's recommendation I updated to 4.10.1 and created a new > collection. > 3. Yes, I am hitting the replicas of the same shard and I see the > lists are completely non overlapping.I am using CloudSolrServer to add the > documents. > 4. I have a 3 physical node cluster , with each having 16GB in memory. > 5. I also have a custom request handler defined in my solrconfig.xml > as below , however I am not using that and I am only using the default > select handler, but my MyCustomHandler class has been been added to the > source and included in the build , but not being used for any requests yet. > > <requestHandler name="/mycustomselect" class="solr.MyCustomHandler" > startup="lazy"> > <lst name="defaults"> > <str name="df">suggestAggregate</str> > > <str name="spellcheck.dictionary">direct</str> > <!--<str name="spellcheck.dictionary">wordbreak</str>--> > <str name="spellcheck">on</str> > <str name="spellcheck.extendedResults">true</str> > <str name="spellcheck.count">10</str> > <str name="spellcheck.alternativeTermCount">5</str> > <str name="spellcheck.maxResultsForSuggest">5</str> > <str name="spellcheck.collate">true</str> > <str name="spellcheck.collateExtendedResults">true</str> > <str name="spellcheck.maxCollationTries">10</str> > <str name="spellcheck.maxCollations">5</str> > </lst> > <arr name="last-components"> > <str>spellcheck</str> > </arr> > </requestHandler> > > > 5. The clusterstate.json is copied below > > {"dyCollection1":{ > "shards":{ > "shard1":{ > "range":"80000000-d554ffff", > "state":"active", > "replicas":{ > "core_node3":{ > "state":"active", > "core":"dyCollection1_shard1_replica1", > "node_name":"server3.mydomain.com:8082_solr", > "base_url":"http://server3.mydomain.com:8082/solr"}, > "core_node4":{ > "state":"active", > "core":"dyCollection1_shard1_replica2", > "node_name":"server2.mydomain.com:8081_solr", > "base_url":"http://server2.mydomain.com:8081/solr", > "leader":"true"}}}, > "shard2":{ > "range":"d5550000-2aa9ffff", > "state":"active", > "replicas":{ > "core_node1":{ > "state":"active", > "core":"dyCollection1_shard2_replica1", > "node_name":"server1.mydomain.com:8081_solr", > "base_url":"http://server1.mydomain.com:8081/solr", > "leader":"true"}, > "core_node6":{ > "state":"active", > "core":"dyCollection1_shard2_replica2", > "node_name":"server3.mydomain.com:8081_solr", > "base_url":"http://server3.mydomain.com:8081/solr"}}}, > "shard3":{ > "range":"2aaa0000-7fffffff", > "state":"active", > "replicas":{ > "core_node2":{ > "state":"active", > "core":"dyCollection1_shard3_replica2", > "node_name":"server1.mydomain.com:8082_solr", > "base_url":"http://server1.mydomain.com:8082/solr", > "leader":"true"}, > "core_node5":{ > "state":"active", > "core":"dyCollection1_shard3_replica1", > "node_name":"server2.mydomain.com:8082_solr", > "base_url":"http://server2.mydomain.com:8082/solr"}}}}, > "maxShardsPerNode":"1", > "router":{"name":"compositeId"}, > "replicationFactor":"2", > "autoAddReplicas":"false"}} > > Thanks! > > On Thu, Oct 16, 2014 at 9:02 PM, Shawn Heisey <apa...@elyograg.org> wrote: > >> On 10/16/2014 6:27 PM, S.L wrote: >> >>> 1. Java Version :java version "1.7.0_51" >>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13) >>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) >>> >> >> I believe that build 51 is one of those that is known to have bugs >> related to Lucene. If you can upgrade this to 67, that would be good, but >> I don't know that it's a pressing matter. It looks like the Oracle JVM, >> which is good. >> >> 2.OS >>> CentOS Linux release 7.0.1406 (Core) >>> >>> 3. Everything is 64 bit , OS , Java , and CPU. >>> >>> 4. Java Args. >>> -Djava.io.tmpdir=/opt/tomcat1/temp >>> -Dcatalina.home=/opt/tomcat1 >>> -Dcatalina.base=/opt/tomcat1 >>> -Djava.endorsed.dirs=/opt/tomcat1/endorsed >>> -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181, >>> server3.mydomain.com:2181 >>> -DzkClientTimeout=20000 >>> -DhostContext=solr >>> -Dport=8081 >>> -Dhost=server1.mydomain.com >>> -Dsolr.solr.home=/opt/solr/home1 >>> -Dfile.encoding=UTF8 >>> -Duser.timezone=UTC >>> -XX:+UseG1GC >>> -XX:MaxPermSize=128m >>> -XX:PermSize=64m >>> -Xmx2048m >>> -Xms128m >>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager >>> -Djava.util.logging.config.file=/opt/tomcat1/conf/ >>> logging.properties >>> >> >> I would not use the G1 collector myself, but with the heap at only 2GB, I >> don't know that it matters all that much. Even a worst-case collection >> probably is not going to take more than a few seconds, and you've already >> increased the zookeeper client timeout. >> >> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning >> >> 5. Zookeeper ensemble has 3 zookeeper instances , which are external and >>> are not embedded. >>> >>> >>> 6. Container : I am using Tomcat Apache Tomcat Version 7.0.42 >>> >>> *Additional Observations:* >>> >>> I queries all docs on both replicas with distrib=false&fl=id&sort=id+ >>> asc, >>> then compared the two lists, I could see by eyeballing the first few >>> lines >>> of ids in both the lists ,I could say that even though each list has >>> equal >>> number of documents i.e 96309 each , but the document ids in them seem to >>> be *mutually exclusive* , , I did not find even a single common id in >>> those lists , I tried at least 15 manually ,it looks like to me that the >>> replicas are disjoint sets. >>> >> >> Are you sure you hit both replicas of the same shard number? If you are, >> then it sounds like something is going wrong with your document routing, or >> maybe your clusterstate is really messed up. Recreating the collection >> from scratch and doing a full reindex might be a good plan ... assuming >> this is possible for you. You could create a whole new collection, and >> then when you're ready to switch, delete the original collection and create >> an alias so your app can still use the old name. >> >> How much total RAM do you have on these systems, and how large are those >> index shards? With a shard having 96K documents, it sounds like your whole >> index is probably just shy of 300K documents. >> >> Thanks, >> Shawn >> >> >