Hi Anca; Could you check the conversation at here: http://lucene.472066.n3.nabble.com/ColrCloud-IOException-occured-when-talking-to-server-at-td4061831.html
Thanks; Furkan KAMACI 18 Aralık 2013 Çarşamba tarihinde Anca Kopetz <anca.kop...@kelkoo.com> adlı kullanıcı şöyle yazdı: > Hi, > > In our SolrCloud cluster (2 shards, 8 replicas), the replicas go from time to time into recovering state, and it takes more than 10 minutes to finish to recover. > > In logs, we see that "PeerSync Recovery" fails with the message : > > PeerSync: core=fr_green url=http://solr-08/searchsolrnodefr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates > > Then "Replication Recovery" starts. > > Is there something we can do to avoid the failure of "Peer Recovery" so that the recovery process is more rapid (less than 10 minutes) ? > > The full trace log is here : > > 2013-12-05 13:51:53,740 [http-8080-46] INFO org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705 - It has been requested that we recover > 2013-12-05 13:51:53,740 [http-8080-112] INFO org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705 - It has been requested that we recover > 2013-12-05 13:51:53,740 [http-8080-112] INFO org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658 - [admin] webapp=null path=/admin/cores params={action=REQUESTRECOVERY&core=fr_green&wt=javabin&version=2} status=0 QTime=0 > 2013-12-05 13:51:53,740 [Thread-1544] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering > 2013-12-05 13:51:53,741 [http-8080-46] INFO org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658 - [admin] webapp=null path=/admin/cores params={action=REQUESTRECOVERY&core=fr_green&wt=javabin&version=2} status=0 QTime=1 > 2013-12-05 13:51:53,740 [Thread-1543] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering > 2013-12-05 13:51:53,743 [Thread-1544] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property > 2013-12-05 13:51:53,746 [Thread-1543] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property > 2013-12-05 13:51:53,755 [Thread-1543] WARN org.apache.solr.cloud.RecoveryStrategy:close:105 - Stopping recovery for zkNodeName=solr-08_searchsolrnodefr_fr_greencore=fr_green > 2013-12-05 13:51:53,756 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:run:216 - Starting recovery process. core=fr_green recoveringAfterStartup=false > 2013-12-05 13:51:53,762 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:495 - Finished recovery process. core=fr_green > 2013-12-05 13:51:53,762 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:run:216 - Starting recovery process. core=fr_green recoveringAfterStartup=false > 2013-12-05 13:51:53,765 [RecoveryThread] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering > 2013-12-05 13:51:53,765 [RecoveryThread] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property > 2013-12-05 13:51:53,767 [RecoveryThread] INFO org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103 - Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > 2013-12-05 13:51:54,777 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:210 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18) > 2013-12-05 13:51:56,804 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:356 - Attempting to PeerSync from http://solr-02/searchsolrnodefr/fr_green/ core=fr_green - recoveringAfterStartup=false > 2013-12-05 13:51:56,806 [RecoveryThread] WARN org.apache.solr.update.PeerSync:sync:232 - PeerSync: core=fr_green url= http://solr-08/searchsolrnodefr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates > 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:394 - PeerSync Recovery was not successful - trying replication. core=fr_green > 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:397 - Starting Replication Recovery. core=fr_green > 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:399 - Begin buffering updates. core=fr_green > 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:replicate:127 - Attempting to replicate from http://solr-02/searchsolrnodefr/fr_green/. core=fr_green > 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103 - Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false > 2013-12-05 13:52:01,203 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:<init>:211 - No value set for 'pollInterval'. Timer Task not started. > 2013-12-05 13:52:01,209 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:fetchLatestIndex:369 - Master's generation: 4382 > 2013-12-05 13:52:01,209 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:fetchLatestIndex:370 - Slave's generation: 4382 > 2013-12-05 13:52:01,209 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:fetchLatestIndex:371 - Starting replication process > 2013-12-05 13:52:01,213 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:fetchLatestIndex:376 - Number of files in latest index in master: 158 > 2013-12-05 13:52:01,213 [RecoveryThread] INFO org.apache.solr.core.CachingDirectoryFactory:get:360 - return new directory for /opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131205135201213 > 2013-12-05 13:52:01,213 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:fetchLatestIndex:402 - Starting download to org.apache.lucene.store.MMapDirectory@/opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131205135201213 lockFactory=org.apache.lucene.store.NativeFSLockFactory@697d36b7fullCopy=true > 2013-12-05 13:52:21,970 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:210 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18) > 2013-12-05 13:55:15,039 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:210 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18) > 2013-12-05 13:56:27,160 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:fetchLatestIndex:406 - Total time taken for download : 265 secs > 2013-12-05 13:56:27,540 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:modifyIndexProps:886 - New index installed. Updating index properties... index=index.20131205135201213 > 2013-12-05 13:56:27,546 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:fetchLatestIndex:461 - removing old index directory org.apache.lucene.store.MMapDirectory@/opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131201033712724 lockFactory=org.apache.lucene.store.NativeFSLockFactory@53949838 > 2013-12-05 13:56:27,765 [RecoveryThread] INFO org.apache.solr.search.SolrIndexSearcher:<init>:197 - Opening Searcher@30a544f3 main > 2013-12-05 13:58:52,337 [RecoveryThread] INFO org.apache.solr.core.CachingDirectoryFactory:closeCacheValue:235 - looking to close /opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131201033712724[CachedDir<<refCount=0;path=/opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131201033712724;done=true>>] > 2013-12-05 13:58:52,337 [RecoveryThread] INFO org.apache.solr.core.CachingDirectoryFactory:close:304 - Closing directory: /opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131201033712724 > 2013-12-05 13:58:52,337 [RecoveryThread] INFO org.apache.solr.core.CachingDirectoryFactory:closeCacheValue:279 - Removing directory before core close: /opt/kookel/data/searchSolrNode/solrindex/fr1_green/index.20131201033712724 > 2013-12-05 13:58:54,172 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:replay:506 - Replaying buffered documents. core=fr_green > 2013-12-05 13:58:54,172 [recoveryExecutor-7-thread-2] WARN org.apache.solr.update.UpdateLog:doReplay:1240 - Starting log replay tlog{file=/opt/kookel/data/searchSolrNode/solrindex/fr1_green/tlog/tlog.0000000000000004091 refcount=2} active=true starting pos=1103833347 > 2013-12-05 13:59:16,660 [recoveryExecutor-7-thread-2] INFO org.apache.solr.search.SolrIndexSearcher:<init>:197 - Opening Searcher@5e48fb12 main > 2013-12-05 14:01:11,908 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:210 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18) > 2013-12-05 14:01:22,846 [recoveryExecutor-7-thread-2] WARN org.apache.solr.update.UpdateLog:run:1230 - Log replay finished. recoveryInfo=RecoveryInfo{adds=39345 deletes=0 deleteByQuery=0 errors=0 positionOfStart=1103833347} > 2013-12-05 14:01:22,846 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:410 - Replication Recovery was successful - registering as Active. core=fr_green > 2013-12-05 14:01:22,846 [RecoveryThread] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=active > 2013-12-05 14:01:22,850 [RecoveryThread] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property > 2013-12-05 14:01:22,860 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:495 - Finished recovery process. core=fr_green > > Best regards, > Anca Kopetz > -- > </mail/u/0/s/?view=att&th=14305504e63dc8a1&attid=0.1.1&disp=emb&zw&atsh=1> > > Anca Kopetz Software engineer > > E anca.kop...@kelkoo.com Y!Messenger kelkooancak > T +33 (0)4 56 09 07 55 > A 4/6 Rue des Meridiens 38130 Echirolles > > </mail/u/0/s/?view=att&th=14305504e63dc8a1&attid=0.1.2&disp=emb&zw&atsh=1> </mail/u/0/s/?view=att&th=14305504e63dc8a1&attid=0.1.3&disp=emb&zw&atsh=1> > ________________________________ > Kelkoo SAS > Société par Actions Simplifiée > Au capital de EURO 4.168.964,30 > Siège social : 8, rue du Sentier 75002 Paris > 425 093 069 RCS Paris > > Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur. >