Hi Ming,

Need some more details,
1. How long was the GC, what is the session timeout in zk.

Behavior you are seeing is expected, what is happening is due to GC and
losing zookeeper session we call the transitions so that partition goes
back to OFFLINE state.

What is the behavior you are looking for when there is GC.

a. You dont want to lose mastership ? or
b. Its ok to lose mastership but you dont want to become master again ?

One question regarding your application, is it possible your application
can recover after long GC pause?

Dont think this is related to HELIX-79, in that case there were consecutive
GC's and I think we have a patch for that issue.

Thanks,
Kishore G


On Sat, May 4, 2013 at 6:32 AM, Ming Fang <[email protected]> wrote:

> We're experiencing a potentially showstopper issue with how Helix is
> dealing with very long GCs.
> Our system is using the Master Slave model.
> A simple test when running just the Master under extreme load, causing
> seconds of GC.
> Under long GC condition the Master gets transitioned to Slave then to
> Offline.
> After the GC, we get transited back to Slave then to Master.
>
> I found this Jira that may be related 
> HELIX-79<https://issues.apache.org/jira/browse/HELIX-79>
> .
> We're scheduled to go live with our system next week.
> Are there any quick workarounds for this problem?
>
>
>

Reply via email to