Shawn ,

Just wanted to follow up , I still face this issue of inconsistent search
results on Solr Cloud 4.1.0.1 , upon further looking into logs , I found
out a few exceptions , what was obvious was zkConnection time out issues
and other exceptions , please take a look .

*Logs*

/opt/tomcat1/logs/catalina.out:103651230 [http-bio-8081-exec-206] WARN
org.apache.solr.handler.ReplicationHandler  – Exception while writing
response for params:
file=_68v.fnm&command=filecontent&checksum=true&wt=filestream&qt=/replication&generation=2410
/opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException:
/opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
/opt/tomcat1/logs/catalina.out:103651579 [http-bio-8081-exec-206] WARN
org.apache.solr.handler.ReplicationHandler  – Exception while writing
response for params:
file=_68v.fnm&command=filecontent&checksum=true&wt=filestream&qt=/replication&generation=2410
/opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException:
/opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
/opt/tomcat1/logs/catalina.out:103651586 [http-bio-8081-exec-206] WARN
org.apache.solr.handler.ReplicationHandler  – Exception while writing
response for params:
file=_68v.fnm&command=filecontent&checksum=true&wt=filestream&qt=/replication&generation=2410
/opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException:
/opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
/opt/tomcat1/logs/catalina.out:103651592 [http-bio-8081-exec-206] WARN
org.apache.solr.handler.ReplicationHandler  – Exception while writing
response for params:
file=_68v.fnm&command=filecontent&checksum=true&wt=filestream&qt=/replication&generation=2410
/opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException:
/opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
/opt/tomcat1/logs/catalina.out:103651600 [http-bio-8081-exec-206] WARN
org.apache.solr.handler.ReplicationHandler  – Exception while writing
response for params:
file=_68v.fnm&command=filecontent&checksum=true&wt=filestream&qt=/replication&generation=2410
/opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException:
/opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
/opt/tomcat1/logs/catalina.out:103651611 [http-bio-8081-exec-203] WARN
org.apache.solr.handler.ReplicationHandler  – Exception while writing
response for params:
file=_68v.fnm&command=filecontent&checksum=true&wt=filestream&qt=/replication&generation=2410
/opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException:
/opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm
/opt/tomcat1/logs/catalina.out: at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
471640118 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.ConnectionManager  – Watcher
org.apache.solr.common.cloud.ConnectionManager@2a7dcd74
name:ZooKeeperConnection Watcher:server1.mydomain.com:2181,
server2.mydomain.com:2181,server3.mydomain.com:2181 got event WatchedEvent
state:Disconnected type:None path:null path:null type:None
471640120 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.ConnectionManager  – zkClient has disconnected
471642457 [zkCallback-2-thread-8] INFO
org.apache.solr.cloud.DistributedQueue  – LatchChildWatcher fired on path:
null state: Expired type None
471642458 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.ConnectionManager  – Watcher
org.apache.solr.common.cloud.ConnectionManager@2a7dcd74
name:ZooKeeperConnection Watcher:server1.mydomain.com:2181,
server2.mydomain.com:2181,server3.mydomain.com:2181 got event WatchedEvent
state:Expired type:None path:null path:null type:None
471642458 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.ConnectionManager  – Our previous ZooKeeper
session was expired. Attempting to reconnect to recover relationship with
ZooKeeper...
471642458 [localhost-startStop-1-EventThread] INFO
org.apache.solr.cloud.Overseer  – Overseer
(id=164669836745768960-server1.mydomain.com:8081_solr-n_0000000019) closing
471642693
[OverseerCollectionProcessor-164669836745768960-server1.mydomain.com:8081_solr-n_0000000019]
INFO  org.apache.solr.cloud.OverseerCollectionProcessor  – According to ZK
I (id=164669836745768960-server1.mydomain.com:8081_solr-n_0000000019) am no
longer a leader.
471643178 
[OverseerStateUpdate-164669836745768960-server1.mydomain.com:8081_solr-n_0000000019]
INFO  org.apache.solr.cloud.Overseer  – Overseer Loop exiting :
server1.mydomain.com:8081_solr
471643727 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.DefaultConnectionStrategy  – Connection
expired - starting a new one...
471643963 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.ConnectionManager  – Waiting for client to
connect to ZooKeeper
471644368 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.ConnectionManager  – Watcher
org.apache.solr.common.cloud.ConnectionManager@2a7dcd74
name:ZooKeeperConnection Watcher:server1.mydomain.com:2181,
server2.mydomain.com:2181,server3.mydomain.com:2181 got event WatchedEvent
state:SyncConnected type:None path:null path:null type:None
471644463 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.ConnectionManager  – Client is connected to
ZooKeeper
471644464 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.ConnectionManager  – Connection with ZooKeeper
reestablished.
471644464 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.DefaultConnectionStrategy  – Reconnected to
ZooKeeper
471644464 [localhost-startStop-1-EventThread] INFO
org.apache.solr.common.cloud.ConnectionManager  – Connected:true
471644571 [OverseerExitThread] ERROR org.apache.solr.cloud.Overseer  –
could not read the data
*org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer_elect/leader*
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
        at
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:307)
        at
org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:304)
        at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
        at
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:304)
        at
org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:320)
        at
org.apache.solr.cloud.Overseer$ClusterStateUpdater.access$300(Overseer.java:89)
        at
org.apache.solr.cloud.Overseer$ClusterStateUpdater$1.run(Overseer.java:292)
471644603 [Thread-2343] INFO  org.apache.solr.cloud.ZkController  –
publishing core=dyCollection1_shard2_replica1 state=down
collection=dyCollection1
471644878 [Thread-2343] INFO  org.apache.solr.cloud.ZkController  – Replica
core_node1 NOT in leader-initiated recovery, need to wait for leader to see
down state.
471645717 [Thread-2343] INFO  org.apache.solr.cloud.ElectionContext  –
canceling election
/overseer_elect/election/164669836745768960-server1.mydomain.com:8081
_solr-n_0000000019
471645742 [Thread-2343] WARN  org.apache.solr.cloud.ElectionContext  –
cancelElection did not find election node to remove
/overseer_elect/election/164669836745768960-server1.mydomain.com:8081
_solr-n_0000000019
471645869 [Thread-2343] INFO  org.apache.solr.common.cloud.ZkStateReader  –
Updating cluster state from ZooKeeper...
471646230 [Thread-2343] INFO  org.apache.solr.cloud.ZkController  –
Register node as live in ZooKeeper:/live_nodes/server1.mydomain.com:8081
_solr
471646277 [Thread-2343] INFO  org.apache.solr.common.cloud.SolrZkClient  –
makePath: /live_nodes/server1.mydomain.com:8081_solr
471646508 [Thread-2343] INFO  org.apache.solr.cloud.ZkController  –
Register replica - core:dyCollection1_shard2_replica1 address:
http://server1.mydomain.com:8081/solr collection:dyCollection1 shard:shard2
471646678 [Thread-2343] INFO  org.apache.solr.cloud.ElectionContext  –
canceling election
/collections/dyCollection1/leader_elect/shard2/election/164669836745768960-core_node1-n_0000000002
471646932 [Thread-2343] WARN  org.apache.solr.cloud.ElectionContext  –
cancelElection did not find election node to remove
/collections/dyCollection1/leader_elect/shard2/election/164669836745768960-core_node1-n_0000000002
471646972 [Thread-2343] INFO  org.apache.solr.cloud.ZkController  – We are
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/ and
leader is
http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
471646972 [Thread-2343] INFO  org.apache.solr.cloud.ZkController  – No
LogReplay needed for core=dyCollection1_shard2_replica1 baseURL=
http://server1.mydomain.com:8081/solr
471646972 [Thread-2343] INFO  org.apache.solr.cloud.ZkController  – Core
needs to recover:dyCollection1_shard2_replica1
471646973 [Thread-2343] INFO  org.apache.solr.update.DefaultSolrCoreState
– Running recovery - first canceling any ongoing recovery
471647606 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –
Starting recovery process.  core=dyCollection1_shard2_replica1
recoveringAfterStartup=true
471648601 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –
####### Found new versions added after startup: num=33
471648628 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –
###### currentVersions=[1482267976600125440, 1482267976541405184,
1482267964838248448, 1482267962649870336, 1482267919451684864,
1482267919392964608, 1482267918793179136, 1482267918732361728,
1482267868830629888, 1482267868770861056, 1482267866553122816,
1482267866495451136, 1482267855821996032, 1482267854691631104,
1482267848546975744, 1482267848487206912, 1482267838120984576,
1482267838058070016, 1482267833656147968, 1482267833596379136,
1482267819169021952, 1482267819110301696, 1482267819050532864,
1482267818987618304, 1482267814068748288, 1482267800491786240,
1482267795263586304, 1482267795202768896, 1482267780293066752,
1482267759067791360, 1482267730781405184, 1482267699959562240,
1482267699897696256]
471648628 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –
###### startupVersions=[]
471648628 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –
Publishing state of core dyCollection1_shard2_replica1 as recovering,
leader is
http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/ and I
am http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/
471648628 [RecoveryThread] INFO  org.apache.solr.cloud.ZkController  –
publishing core=dyCollection1_shard2_replica1 state=recovering
collection=dyCollection1
471648793 [zkCallback-2-thread-11] INFO
org.apache.solr.common.cloud.ZkStateReader  – A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged
path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
471649248 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –
Sending prep recovery command to http://server3.mydomain.com:8081/solr;
WaitForState:
action=PREPRECOVERY&core=dyCollection1_shard2_replica2&nodeName=
server1.mydomain.com
%3A8081_solr&coreNodeName=core_node1&state=recovering&checkLive=true&onlyIfLeader=true&onlyIfLeaderActive=true
471651448 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –
Attempting to PeerSync from
http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
core=dyCollection1_shard2_replica1 - recoveringAfterStartup=true
471651690 [RecoveryThread] INFO  org.apache.solr.update.PeerSync  –
PeerSync: core=dyCollection1_shard2_replica1 url=
http://server1.mydomain.com:8081/solr START replicas=[
http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/]
nUpdates=100
471652187 [RecoveryThread] WARN  org.apache.solr.update.PeerSync  – no
frame of reference to tell if we've missed updates
471652187 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –*
PeerSync Recovery was not successful - trying replication.*
core=dyCollection1_shard2_replica1
471652187 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –
Starting Replication Recovery. core=dyCollection1_shard2_replica1
471652187 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –
Begin buffering updates. core=dyCollection1_shard2_replica1
471652471 [RecoveryThread] INFO  org.apache.solr.update.UpdateLog  –
Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
471652478 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  –
Attempting to replicate from
http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/.
core=dyCollection1_shard2_replica1
471653514 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –  No
value set for 'pollInterval'. Timer Task not started.
471653568 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
Master's generation: 10685
471653568 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
Slave's generation: 10713
471653569 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
Starting replication process
471653943 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
Number of files in latest index in master: 108
471653944 [RecoveryThread] INFO
org.apache.solr.core.CachingDirectoryFactory  – return new directory for
/opt/solr/home1/dyCollection1_shard2_replica1/data/index.20141018111139463
471654573 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
Starting download to
NRTCachingDirectory(MMapDirectory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index.20141018111139463
lockFactory=NativeFSLockFactory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index.20141018111139463;
maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true
471834454 [zkCallback-2-thread-12] INFO
org.apache.solr.common.cloud.ZkStateReader  – A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged
path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
471897454 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
Total time taken for download : 243 secs
471898551 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  – New
index installed. Updating index properties... index=index.20141018111139463
471898932 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
removing old index directory
NRTCachingDirectory(MMapDirectory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index
lockFactory=NativeFSLockFactory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index;
maxCacheMB=48.0 maxMergeSizeMB=4.0)
471898932 [RecoveryThread] INFO
org.apache.solr.update.DefaultSolrCoreState  – Creating new IndexWriter...
471898934 [RecoveryThread] INFO
org.apache.solr.update.DefaultSolrCoreState  – Waiting until IndexWriter is
unused... core=dyCollection1_shard2_replica1
471898934 [RecoveryThread] INFO
org.apache.solr.update.DefaultSolrCoreState  – Rollback old IndexWriter...
core=dyCollection1_shard2_replica1
471904192 [RecoveryThread] INFO  org.apache.solr.core.SolrCore  – New index
directory detected:
old=/opt/solr/home1/dyCollection1_shard2_replica1/data/index/
new=/opt/solr/home1/dyCollection1_shard2_replica1/data/index.20141018111139463
471904907 [RecoveryThread] INFO  org.apache.solr.core.SolrCore  –
SolrDeletionPolicy.onInit: commits: num=1
        
commit{dir=NRTCachingDirectory(MMapDirectory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index.20141018111139463
lockFactory=NativeFSLockFactory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index.20141018111139463;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_88t,generation=10685}
471904907 [RecoveryThread] INFO  org.apache.solr.core.SolrCore  – newest
commit generation = 10685

On Fri, Oct 17, 2014 at 1:12 PM, S.L <simpleliving...@gmail.com> wrote:

> Shawn,
>
> Just wondering if you have any other suggestions on what the next steps
> whould be ? Thanks.
>
> On Thu, Oct 16, 2014 at 11:12 PM, S.L <simpleliving...@gmail.com> wrote:
>
>> Shawn ,
>>
>>
>>    1. I will upgrade to 67 JVM  shortly .
>>    2. This is  a new collection as , I was facing a similar issue in 4.7
>>    and based on Erick's recommendation I updated to 4.10.1 and created a new
>>    collection.
>>    3. Yes, I am hitting the replicas of the same shard and I see the
>>    lists are completely non overlapping.I am using CloudSolrServer to add the
>>    documents.
>>    4. I have a 3 physical node cluster , with each having 16GB in memory.
>>    5. I also have a custom request handler defined in my solrconfig.xml
>>    as below , however I am not using that and I am only using the default
>>    select handler, but my MyCustomHandler class has been been added to the
>>    source and included in the build , but not being used for any requests 
>> yet.
>>
>>   <requestHandler name="/mycustomselect" class="solr.MyCustomHandler"
>> startup="lazy">
>>     <lst name="defaults">
>>       <str name="df">suggestAggregate</str>
>>
>>       <str name="spellcheck.dictionary">direct</str>
>>       <!--<str name="spellcheck.dictionary">wordbreak</str>-->
>>       <str name="spellcheck">on</str>
>>       <str name="spellcheck.extendedResults">true</str>
>>       <str name="spellcheck.count">10</str>
>>       <str name="spellcheck.alternativeTermCount">5</str>
>>       <str name="spellcheck.maxResultsForSuggest">5</str>
>>       <str name="spellcheck.collate">true</str>
>>       <str name="spellcheck.collateExtendedResults">true</str>
>>       <str name="spellcheck.maxCollationTries">10</str>
>>       <str name="spellcheck.maxCollations">5</str>
>>     </lst>
>>     <arr name="last-components">
>>       <str>spellcheck</str>
>>     </arr>
>>   </requestHandler>
>>
>>
>>             5. The clusterstate.json is copied below
>>
>>                     {"dyCollection1":{
>>     "shards":{
>>       "shard1":{
>>         "range":"80000000-d554ffff",
>>         "state":"active",
>>         "replicas":{
>>           "core_node3":{
>>             "state":"active",
>>             "core":"dyCollection1_shard1_replica1",
>>             "node_name":"server3.mydomain.com:8082_solr",
>>             "base_url":"http://server3.mydomain.com:8082/solr"},
>>           "core_node4":{
>>             "state":"active",
>>             "core":"dyCollection1_shard1_replica2",
>>             "node_name":"server2.mydomain.com:8081_solr",
>>             "base_url":"http://server2.mydomain.com:8081/solr";,
>>             "leader":"true"}}},
>>       "shard2":{
>>         "range":"d5550000-2aa9ffff",
>>         "state":"active",
>>         "replicas":{
>>           "core_node1":{
>>             "state":"active",
>>             "core":"dyCollection1_shard2_replica1",
>>             "node_name":"server1.mydomain.com:8081_solr",
>>             "base_url":"http://server1.mydomain.com:8081/solr";,
>>             "leader":"true"},
>>           "core_node6":{
>>             "state":"active",
>>             "core":"dyCollection1_shard2_replica2",
>>             "node_name":"server3.mydomain.com:8081_solr",
>>             "base_url":"http://server3.mydomain.com:8081/solr"}}},
>>       "shard3":{
>>         "range":"2aaa0000-7fffffff",
>>         "state":"active",
>>         "replicas":{
>>           "core_node2":{
>>             "state":"active",
>>             "core":"dyCollection1_shard3_replica2",
>>             "node_name":"server1.mydomain.com:8082_solr",
>>             "base_url":"http://server1.mydomain.com:8082/solr";,
>>             "leader":"true"},
>>           "core_node5":{
>>             "state":"active",
>>             "core":"dyCollection1_shard3_replica1",
>>             "node_name":"server2.mydomain.com:8082_solr",
>>             "base_url":"http://server2.mydomain.com:8082/solr"}}}},
>>     "maxShardsPerNode":"1",
>>     "router":{"name":"compositeId"},
>>     "replicationFactor":"2",
>>     "autoAddReplicas":"false"}}
>>
>>   Thanks!
>>
>> On Thu, Oct 16, 2014 at 9:02 PM, Shawn Heisey <apa...@elyograg.org>
>> wrote:
>>
>>> On 10/16/2014 6:27 PM, S.L wrote:
>>>
>>>> 1. Java Version :java version "1.7.0_51"
>>>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>>>>
>>>
>>> I believe that build 51 is one of those that is known to have bugs
>>> related to Lucene.  If you can upgrade this to 67, that would be good, but
>>> I don't know that it's a pressing matter.  It looks like the Oracle JVM,
>>> which is good.
>>>
>>>  2.OS
>>>> CentOS Linux release 7.0.1406 (Core)
>>>>
>>>> 3. Everything is 64 bit , OS , Java , and CPU.
>>>>
>>>> 4. Java Args.
>>>>      -Djava.io.tmpdir=/opt/tomcat1/temp
>>>>      -Dcatalina.home=/opt/tomcat1
>>>>      -Dcatalina.base=/opt/tomcat1
>>>>      -Djava.endorsed.dirs=/opt/tomcat1/endorsed
>>>>      -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181,
>>>> server3.mydomain.com:2181
>>>>      -DzkClientTimeout=20000
>>>>      -DhostContext=solr
>>>>      -Dport=8081
>>>>      -Dhost=server1.mydomain.com
>>>>      -Dsolr.solr.home=/opt/solr/home1
>>>>      -Dfile.encoding=UTF8
>>>>      -Duser.timezone=UTC
>>>>      -XX:+UseG1GC
>>>>      -XX:MaxPermSize=128m
>>>>      -XX:PermSize=64m
>>>>      -Xmx2048m
>>>>      -Xms128m
>>>>      -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>>      -Djava.util.logging.config.file=/opt/tomcat1/conf/
>>>> logging.properties
>>>>
>>>
>>> I would not use the G1 collector myself, but with the heap at only 2GB,
>>> I don't know that it matters all that much.  Even a worst-case collection
>>> probably is not going to take more than a few seconds, and you've already
>>> increased the zookeeper client timeout.
>>>
>>> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>>>
>>>  5. Zookeeper ensemble has 3 zookeeper instances , which are external and
>>>> are not embedded.
>>>>
>>>>
>>>> 6. Container : I am using Tomcat Apache Tomcat Version 7.0.42
>>>>
>>>> *Additional Observations:*
>>>>
>>>> I queries all docs on both replicas with distrib=false&fl=id&sort=id+
>>>> asc,
>>>> then compared the two lists, I could see by eyeballing the first few
>>>> lines
>>>> of ids in both the lists ,I could say that even though each list has
>>>> equal
>>>> number of documents i.e 96309 each , but the document ids in them seem
>>>> to
>>>> be *mutually exclusive* ,  , I did not find even a single  common id in
>>>> those lists , I tried at least 15 manually ,it looks like to me that the
>>>> replicas are disjoint sets.
>>>>
>>>
>>> Are you sure you hit both replicas of the same shard number?  If you
>>> are, then it sounds like something is going wrong with your document
>>> routing, or maybe your clusterstate is really messed up.  Recreating the
>>> collection from scratch and doing a full reindex might be a good plan ...
>>> assuming this is possible for you.  You could create a whole new
>>> collection, and then when you're ready to switch, delete the original
>>> collection and create an alias so your app can still use the old name.
>>>
>>> How much total RAM do you have on these systems, and how large are those
>>> index shards?  With a shard having 96K documents, it sounds like your whole
>>> index is probably just shy of 300K documents.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>

Reply via email to