Has anyone been running kafka well on the st1 EBS volumes?

We've historically run on m1 and m2 instance types for our Kafka workload
but wanted to move to the M4s to get better price/performance.

We rolled out a single instance in two environments with M4 and 1 TB of
st1.  Everything seemed to be going well, lower cpu util, flush times
looked good, etc.   Then, a few hours later, both our Kafka flush times and
CPU wait times went up much higher than the rest of the cluster and just
stayed there.

Looking at the cloudwatch metrics, it shows that our "burst balance" had
been slowly degrading over those hours, and as soon as it was exhausted,
that's when the elevated times happened.

I could possibly believe that with the production workload, we were
overwhelming some allocation, but in our staging environment, it makes no
sense.   Cloudwatch says that our average write size is 40 KB/op which I
suspect is just far too low for the st1 as it's designed for large
sequential writes.  I believe we're eating up our IOPS allocation but I
could be just looking at the completely wrong thing.

We're using an XFS filesystem with a 16 M allocsize.

Does anyone have experience with this volume type?  I suspect we're just
holding it wrong.

Cheers,

-Dave


-- 
Dave Mangot
Director of Operations
Librato and Papertrail

Reply via email to