On Sat, Sep 24, 2011 at 4:54 PM, Yang <teddyyyy...@gmail.com> wrote: > I'm using 1.0.0 > > > there seems to be too many node Up/Dead events detected by the failure > detector. > I'm using a 2 node cluster on EC2, in the same region, same security > group, so I assume the message drop > rate should be fairly low. > but in about every 5 minutes, I'm seeing some node detected as down, > and then Up again quickly
This is fairly common on ec2 due to wild variance in the network. Increase your phi_convict_threshold to 10 or higher (but I wouldn't go over 12, this is roughly an exponential increase) -Brandon