We should obviously get to the bottom of this. But I was thinking, should we
have some sort of timeouts on the SnapPuller in the slave to avoid such
scenarios? Locking out snap pulls forever is not a good idea.

On Mon, Mar 23, 2009 at 8:57 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> So this is only one slave that hangs up and not the master?
> Can you get thread dumps on both the master and the slave during a hang?
>
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Mon, Mar 23, 2009 at 10:44 AM, Jeff Newburn <jnewb...@zappos.com>
> wrote:
> > We are having an intermittent problem with replication. We reindex
> nightly
> > which usually means there are 2 commits during replication then a final
> > commit/optimize at the end.  For some reason the replication will hang
> > occasionally with the following screenshot.  This is frustrating as it
> will
> > completely stall out any further replications. Additionally, it seems to
> > only happen on reindex and it will strike 1 server randomly but not
> always
> > the same server.
> >
> >
> > In case the screen shot doesn’t come through:
> >
> > Master        http://10.66.209.38:8080/solr/zeta-main/replication
> >     Latest Index Version:1233423827699, Generation: 6237
> >     Replicatable Index Version:0, Generation: 0
> > Poll Interval     00:05:00
> > Local Index     Index Version: 1233423827684, Generation: 6222
> >     Location: /opt/solr-data/zeta-main/index
> >     Size: 1.29 GB
> >     Times Replicated Since Startup: 3591
> >     Previous Replication Done At: Mon Mar 23 00:18:03 PDT 2009
> >     Config Files Replicated At: Wed Mar 18 06:07:53 PDT 2009
> >     Config Files Replicated: [synonyms.txt]
> >     Times Config Files Replicated Since Startup: 4
> >     Next Replication Cycle At: Mon Mar 23 00:27:55 PDT 2009
> > Current Replication Status     Start Time: Mon Mar 23 00:22:55 PDT 2009
> >     Files Downloaded: 12 / 163
> >     Downloaded: 4.12 MB / 1.41 GB [0.0%]
> >     Downloading File: _5no.tis, Downloaded: 0 bytes / 629.57 KB [0.0%]
> >     Time Elapsed: 26371s, Estimated Time Remaining: 9216278s, Speed: 163
> > bytes/s
> >
> >
> >
> > --
> > Jeff Newburn
> > Software Engineer, Zappos.com
> > jnewb...@zappos.com - 702-943-7562
> >
>



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to