Hi,

In our SolrCloud cluster (2 shards, 8 replicas), the replicas go from time to time into recovering state, and it takes more than 10 minutes to finish to recover.

In logs, we see that "PeerSync Recovery" fails with the message :

PeerSync: core=fr_green url="" class="moz-txt-link-freetext" href="http://solr-08/searchsolrnodefr">http://solr-08/searchsolrnodefr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates

Then "Replication Recovery" starts.

Is there something we can do to avoid the failure of "Peer Recovery" so that the recovery process is more rapid (less than 10 minutes) ?

The full trace log is here :

2013-12-05 13:51:53,740 [http-8080-46] INFO  org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705  - It has been requested that we recover
2013-12-05 13:51:53,740 [http-8080-112] INFO  org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705  - It has been requested that we recover
2013-12-05 13:51:53,740 [http-8080-112] INFO  org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658  - [admin] webapp=null path=/admin/cores params={action="" status=0 QTime=0
2013-12-05 13:51:53,740 [Thread-1544] INFO  org.apache.solr.cloud.ZkController:publish:1017  - publishing core=fr_green state=recovering
2013-12-05 13:51:53,741 [http-8080-46] INFO  org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658  - [admin] webapp=null path=/admin/cores params={action="" status=0 QTime=1
2013-12-05 13:51:53,740 [Thread-1543] INFO  org.apache.solr.cloud.ZkController:publish:1017  - publishing core=fr_green state=recovering
2013-12-05 13:51:53,743 [Thread-1544] INFO  org.apache.solr.cloud.ZkController:publish:1021  - numShards not found on descriptor - reading it from system property
2013-12-05 13:51:53,746 [Thread-1543] INFO  org.apache.solr.cloud.ZkController:publish:1021  - numShards not found on descriptor - reading it from system property
2013-12-05 13:51:53,755 [Thread-1543] WARN  org.apache.solr.cloud.RecoveryStrategy:close:105  - Stopping recovery for zkNodeName=solr-08_searchsolrnodefr_fr_greencore=fr_green
2013-12-05 13:51:53,756 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:run:216  - Starting recovery process.  core=fr_green recoveringAfterStartup=false
2013-12-05 13:51:53,762 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:doRecovery:495  - Finished recovery process. core=fr_green
2013-12-05 13:51:53,762 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:run:216  - Starting recovery process.  core=fr_green recoveringAfterStartup=false
2013-12-05 13:51:53,765 [RecoveryThread] INFO  org.apache.solr.cloud.ZkController:publish:1017  - publishing core=fr_green state=recovering
2013-12-05 13:51:53,765 [RecoveryThread] INFO  org.apache.solr.cloud.ZkController:publish:1021  - numShards not found on descriptor - reading it from system property
2013-12-05 13:51:53,767 [RecoveryThread] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103  - Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
2013-12-05 13:51:54,777 [main-EventThread] INFO  org.apache.solr.common.cloud.ZkStateReader:process:210  - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18)
2013-12-05 13:51:56,804 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:doRecovery:356  - Attempting to PeerSync from http://solr-02/searchsolrnodefr/fr_green/ core=fr_green - recoveringAfterStartup=false
2013-12-05 13:51:56,806 [RecoveryThread] WARN  org.apache.solr.update.PeerSync:sync:232  - PeerSync: core=fr_green url="" class="moz-txt-link-freetext" href="http://solr-08/searchsolrnodefr">http://solr-08/searchsolrnodefr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates
2013-12-05 13:51:56,806 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:doRecovery:394  - PeerSync Recovery was not successful - trying replication. core=fr_green

2013-12-05 13:51:56,806 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:doRecovery:397  - Starting Replication Recovery. core=fr_green
2013-12-05 13:51:56,806 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:doRecovery:399  - Begin buffering updates. core=fr_green
2013-12-05 13:51:56,806 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:replicate:127  - Attempting to replicate from http://solr-02/searchsolrnodefr/fr_green/. core=fr_green
2013-12-05 13:51:56,806 [RecoveryThread] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103  - Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
2013-12-05 13:52:01,203 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller:<init>:211  -  No value set for 'pollInterval'. Timer Task not started.
2013-12-05 13:52:01,209 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller:fetchLatestIndex:369  - Master's generation: 4382
2013-12-05 13:52:01,209 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller:fetchLatestIndex:370  - Slave's generation: 4382
2013-12-05 13:52:01,209 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller:fetchLatestIndex:371  - Starting replication process
2013-12-05 13:52:01,213 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller:fetchLatestIndex:376  - Number of files in latest index in master: 158
2013-12-05 13:52:01,213 [RecoveryThread] INFO  org.apache.solr.core.CachingDirectoryFactory:get:360  - return new directory for /opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131205135201213
2013-12-05 13:52:01,213 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller:fetchLatestIndex:402  - Starting download to org.apache.lucene.store.MMapDirectory@/opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131205135201213 lockFactory=org.apache.lucene.store.NativeFSLockFactory@697d36b7 fullCopy=true
2013-12-05 13:52:21,970 [main-EventThread] INFO  org.apache.solr.common.cloud.ZkStateReader:process:210  - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18)
2013-12-05 13:55:15,039 [main-EventThread] INFO  org.apache.solr.common.cloud.ZkStateReader:process:210  - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18)
2013-12-05 13:56:27,160 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller:fetchLatestIndex:406  - Total time taken for download : 265 secs
2013-12-05 13:56:27,540 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller:modifyIndexProps:886  - New index installed. Updating index properties... index=index.20131205135201213
2013-12-05 13:56:27,546 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller:fetchLatestIndex:461  - removing old index directory org.apache.lucene.store.MMapDirectory@/opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131201033712724 lockFactory=org.apache.lucene.store.NativeFSLockFactory@53949838
2013-12-05 13:56:27,765 [RecoveryThread] INFO  org.apache.solr.search.SolrIndexSearcher:<init>:197  - Opening Searcher@30a544f3 main
2013-12-05 13:58:52,337 [RecoveryThread] INFO  org.apache.solr.core.CachingDirectoryFactory:closeCacheValue:235  - looking to close /opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131201033712724[CachedDir<<refCount=0;path=/opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131201033712724;done=true>>]
2013-12-05 13:58:52,337 [RecoveryThread] INFO  org.apache.solr.core.CachingDirectoryFactory:close:304  - Closing directory: /opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131201033712724
2013-12-05 13:58:52,337 [RecoveryThread] INFO  org.apache.solr.core.CachingDirectoryFactory:closeCacheValue:279  - Removing directory before core close: /opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131201033712724
2013-12-05 13:58:54,172 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:replay:506  - Replaying buffered documents. core=fr_green
2013-12-05 13:58:54,172 [recoveryExecutor-7-thread-2] WARN  org.apache.solr.update.UpdateLog:doReplay:1240  - Starting log replay tlog{file=/opt/kookel/data/searchSolrNode/solrindex/fr1_green/tlog/tlog.0000000000000004091 refcount=2} active=true starting pos=1103833347
2013-12-05 13:59:16,660 [recoveryExecutor-7-thread-2] INFO  org.apache.solr.search.SolrIndexSearcher:<init>:197  - Opening Searcher@5e48fb12 main
2013-12-05 14:01:11,908 [main-EventThread] INFO  org.apache.solr.common.cloud.ZkStateReader:process:210  - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18)
2013-12-05 14:01:22,846 [recoveryExecutor-7-thread-2] WARN  org.apache.solr.update.UpdateLog:run:1230  - Log replay finished. recoveryInfo=RecoveryInfo{adds=39345 deletes=0 deleteByQuery=0 errors=0 positionOfStart=1103833347}
2013-12-05 14:01:22,846 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:doRecovery:410  - Replication Recovery was successful - registering as Active. core=fr_green
2013-12-05 14:01:22,846 [RecoveryThread] INFO  org.apache.solr.cloud.ZkController:publish:1017  - publishing core=fr_green state=active
2013-12-05 14:01:22,850 [RecoveryThread] INFO  org.apache.solr.cloud.ZkController:publish:1021  - numShards not found on descriptor - reading it from system property
2013-12-05 14:01:22,860 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy:doRecovery:495  - Finished recovery process. core=fr_green


Best regards,
Anca Kopetz
--
Kelkoo

Anca Kopetz Software engineer

E anca.kop...@kelkoo.com  Y!Messenger kelkooancak
T +33 (0)4 56 09 07 55       
A 4/6 Rue des Meridiens 38130 Echirolles



Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.

Reply via email to