We should obviously get to the bottom of this. But I was thinking, should we have some sort of timeouts on the SnapPuller in the slave to avoid such scenarios? Locking out snap pulls forever is not a good idea.
On Mon, Mar 23, 2009 at 8:57 PM, Yonik Seeley <yo...@lucidimagination.com>wrote: > So this is only one slave that hangs up and not the master? > Can you get thread dumps on both the master and the slave during a hang? > > > -Yonik > http://www.lucidimagination.com > > > On Mon, Mar 23, 2009 at 10:44 AM, Jeff Newburn <jnewb...@zappos.com> > wrote: > > We are having an intermittent problem with replication. We reindex > nightly > > which usually means there are 2 commits during replication then a final > > commit/optimize at the end. For some reason the replication will hang > > occasionally with the following screenshot. This is frustrating as it > will > > completely stall out any further replications. Additionally, it seems to > > only happen on reindex and it will strike 1 server randomly but not > always > > the same server. > > > > > > In case the screen shot doesn’t come through: > > > > Master http://10.66.209.38:8080/solr/zeta-main/replication > > Latest Index Version:1233423827699, Generation: 6237 > > Replicatable Index Version:0, Generation: 0 > > Poll Interval 00:05:00 > > Local Index Index Version: 1233423827684, Generation: 6222 > > Location: /opt/solr-data/zeta-main/index > > Size: 1.29 GB > > Times Replicated Since Startup: 3591 > > Previous Replication Done At: Mon Mar 23 00:18:03 PDT 2009 > > Config Files Replicated At: Wed Mar 18 06:07:53 PDT 2009 > > Config Files Replicated: [synonyms.txt] > > Times Config Files Replicated Since Startup: 4 > > Next Replication Cycle At: Mon Mar 23 00:27:55 PDT 2009 > > Current Replication Status Start Time: Mon Mar 23 00:22:55 PDT 2009 > > Files Downloaded: 12 / 163 > > Downloaded: 4.12 MB / 1.41 GB [0.0%] > > Downloading File: _5no.tis, Downloaded: 0 bytes / 629.57 KB [0.0%] > > Time Elapsed: 26371s, Estimated Time Remaining: 9216278s, Speed: 163 > > bytes/s > > > > > > > > -- > > Jeff Newburn > > Software Engineer, Zappos.com > > jnewb...@zappos.com - 702-943-7562 > > > -- Regards, Shalin Shekhar Mangar.