1369725251. It's a fix in the underline virtualization platform. Here is
the quote from the ticket.

"The issue is a bug in a performance improvement (10% improved PPS when
using Xen PV "netback/netfront" networking) in the latest build of the
virtualization platform, which has only been released to D2 instances. The
issue is triggered by a race condition deadlock in kernel code that your
workload appears to trigger 5-10% of the time."

On Tue, Jun 2, 2015 at 4:26 PM, Henry Cai <h...@pinterest.com.invalid>
wrote:

> Steven,
>
> Do you have the AWS case # (or the Ubuntu bug/case #) when you hit that
> kernel panic issue?
>
> Our company will still be running on AMI image 12.04 for a while, I will
> see whether the fix was also ported onto Ubuntu 12.04
>
> On Tue, Jun 2, 2015 at 2:53 PM, Steven Wu <stevenz...@gmail.com> wrote:
>
> > now I remember we had same kernel panic issue in the first week of D2
> > rolling-out. then AWS fixed it and we haven't seen any issue since. try
> > Ubuntu 14.04 and see if it resolves your remaining kernel/instability
> issue.
> >
> > On Tue, Jun 2, 2015 at 2:30 PM, Wes Chow <w...@chartbeat.com> wrote:
> >
> >>
> >>   Daniel Nelson <daniel.nel...@vungle.com>
> >>  June 2, 2015 at 4:39 PM
> >>
> >> On Jun 2, 2015, at 1:22 PM, Steven Wu <stevenz...@gmail.com> <
> stevenz...@gmail.com> wrote:
> >>
> >> can you elaborate what kind of instability you have encountered?
> >>
> >> We have seen the nodes become completely non-responsive. Usually they
> get rebooted automatically after 10-20 minutes, but occasionally they get
> stuck for days in a state where they cannot be rebooted via the Amazon APIs.
> >>
> >>
> >> Same here. It was worse right after d2 launch. We had 6 out of 9 servers
> >> die within 10 hours after spinning them up. Amazon rolled out a fix, but
> >> we're still seeing similar issues, though not nearly as bad. The first
> fix
> >> was for something network related, and apparently sending lots of data
> >> through the instances caused a kernel panic on the host. We have no
> >> information yet about the current issue.
> >>
> >> Wes
> >>
> >>   Steven Wu <stevenz...@gmail.com>
> >>  June 2, 2015 at 4:22 PM
> >> Wes/Daniel,
> >>
> >> can you elaborate what kind of instability you have encountered?
> >>
> >> we are on Ubuntu 14.04.2 and haven't encountered any issues so far. in
> >> the announcement, they did mention using Ubuntu 14.04 for better disk
> >> throughput. not sure whether 14.04 also addresses any instability issue
> you
> >> encountered or not.
> >>
> >> Thanks,
> >> Steven
> >>
> >> In order to ensure the best disk throughput performance from your D2
> instances
> >> on Linux, we recommend that you use the most recent version of the
> Amazon
> >> Linux AMI, or another Linux AMI with a kernel version of 3.8 or later.
> The
> >> D2 instances provide the best disk performance when you use a Linux
> >> kernel that supports Persistent Grants – an extension to the Xen block
> ring
> >> protocol that significantly improves disk throughput and scalability.
> The
> >> following Linux AMIs support this feature:
> >>
> >>    - Amazon Linux AMI 2015.03 (HVM)
> >>    - Ubuntu Server 14.04 LTS (HVM)
> >>    - Red Hat Enterprise Linux 7.1 (HVM)
> >>    - SUSE Linux Enterprise Server 12 (HVM)
> >>
> >>
> >>
> >>
> >>   Daniel Nelson <daniel.nel...@vungle.com>
> >>  June 2, 2015 at 2:42 PM
> >>
> >> Do you have any workarounds for the d2 issues? We’ve been using them for
> >> our Kafkas too, and ran into the instability. We’re on Ubuntu 12.04 and
> >> plan to try on 14.04 with the latest HWE to see if that helps any.
> >>
> >> Thanks!
> >>   Wes Chow <w...@chartbeat.com>
> >>  June 2, 2015 at 1:39 PM
> >>
> >> We have run d2 instances with Kafka. They're currently unstable --
> Amazon
> >> confirmed a host issue with d2 instances that gets tickled by a Kafka
> >> workload yesterday. Otherwise, it seems the d2 instance type is ideal
> as it
> >> gets an enormous amount of disk throughput and you'll likely be network
> >> bottlenecked.
> >>
> >> Wes
> >>
> >>
> >>   Steven Wu <stevenz...@gmail.com>
> >>  June 2, 2015 at 1:07 PM
> >> EBS (network attached storage) has got a lot better over the last a few
> >> years. we don't quite trust it for kafka workload.
> >>
> >> At Netflix, we were going with the new d2 instance type (HDD). our
> >> perf/load testing shows it satisfy our workload. SSD is better in
> latency
> >> curve but pretty comparable in terms of throughput. we can use the extra
> >> space from HDD for longer retention period.
> >>
> >> On Tue, Jun 2, 2015 at 9:37 AM, Henry Cai <h...@pinterest.com.invalid>
> >> <h...@pinterest.com.invalid>
> >>
> >>
> >
>

Reply via email to