Re: Very regular disconnect and recover - every 2 hours

2015-04-01 Thread Neil Andrassy
Further to this, the cluster that's failing has far more shards than the one that stays up. We have a number of daily date stamped indexes, each retaining 90 days of data amounting to 7500+ shards. Looking through github, it looks like some underlying changes might have impacted the

Re: Very regular disconnect and recover - every 2 hours

2015-03-31 Thread Mark Walkom
You can try winding out the timeouts, see http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#fault-detection On 31 March 2015 at 16:57, Neil Andrassy neil.andra...@thefilter.com wrote: It's probably something like that, but it only seems to be a problem

Re: Very regular disconnect and recover - every 2 hours

2015-03-31 Thread Neil Andrassy (The Filter)
Both clusters have the following settings so, if that's related, I think there must be another contributing factor... discovery.zen.fd.ping_interval : 1s, discovery.zen.fd.ping_timeout : 60s, discovery.zen.fd.ping_retries : 3, On 31 March 2015 at 07:32, Mark Walkom markwal...@gmail.com wrote:

Very regular disconnect and recover - every 2 hours

2015-03-30 Thread Neil Andrassy
Hi, I have two independent clusters running across more or less the same machines. They're split across a pretty high bandwidth and relatively low latency VPN link. One cluster is running v1.0.1 and seems to stay up all the time. The other cluster is currently running 1.4.4 (and was running

Re: Very regular disconnect and recover - every 2 hours

2015-03-30 Thread Mark Walkom
It's not the VPN reconnecting is it? On 31 March 2015 at 01:32, Neil Andrassy neil.andra...@thefilter.com wrote: Hi, I have two independent clusters running across more or less the same machines. They're split across a pretty high bandwidth and relatively low latency VPN link. One cluster

Re: Very regular disconnect and recover - every 2 hours

2015-03-30 Thread Neil Andrassy
It's probably something like that, but it only seems to be a problem with the more up to date version of ES. I'm keen to work out if there's a configuration option I can tweak in 1.4.4 to make ES more robust in this scenario or whether there's an issue around recovering dropped TCP connections