All the patching aside, the network (and some of the server hardware) within the AWS East zones has been..... brittle, for at least 1-2 years. We have pinger hosts in all our datacenters, and in every AZ where we have instances.
This info is mostly from an internal study we did last year, so may no longer be completely valid. It has been common, almost expected, that US-East will often have >80% packet loss, especially to AWS EU, for minutes or even hours at a time. Since this is ICMP and/or UDP (depending how we wrote the tests), this is believed to be an indicator of congestion within AWS' network, since ICMP and UDP are the first packets to be dropped when routers/links congest. US-East also seems to have the oldest hardware, and highest instance failure rates. This was especially true for the smaller instances, which are believed to be on the older server hardware. We measured "half life to failure" of groups of instances, and tiny/smalls tended to see lots (80%?) die within 30 days at the outside, many died within 48 hours of launch. Larger instances had much longer lifetimes. Of course, it is Amazon *web* services, so as long as TCP/80 and TCP/443 are working, few will notice it. I'm guessing that your connections to Zabbix are not TCP? Try opening a long-term TCP connection to the host, maybe SSH+keepalives and my theory is that the TCP session will be fine and ride through the Zabbix interruptions you're seeing. At least until your server is rebooted out from under you :-) On Fri, Sep 26, 2014 at 5:03 AM, Bill Bogstad <[email protected]> wrote: > On Fri, Sep 26, 2014 at 1:24 PM, Derek Balling <[email protected]> wrote: >> On Sep 26, 2014, at 7:22 AM, Sean Lally <[email protected]> wrote: >> >> Haven't seen that, but they are doing a bunch of scheduled reboots that >> started yesterday. Guessing they're patching for bash... >> >> >> The reboots started before bash. The current running-theory is there's a bug >> in Xen that allows a guest to pierce the hypervisor and get outside. > > I would say this is confirmed by this Amazon AWS blog post: > > http://aws.amazon.com/blogs/aws/ec2-maintenance-update/ > > They apparently have an Oct. 1st deadline before the bug is made public. > > Bill Bogstad > _______________________________________________ > Tech mailing list > [email protected] > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech > This list provided by the League of Professional System Administrators > http://lopsa.org/ _______________________________________________ Tech mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
