Sure!

A couple things pushed us in to the upgrade. Our machines were very small
in terms of storage, we really wanted replication and we weren't really
monitoring our current machines well since JMX was turned off (and without
replication/leadership a restart was hard).

The first wrinkle was that Kafka 0.8 is basically a whole new system. Since
the only real way to upgrade is to set up a new cluster things were
obviously harder than install & reboot. :)

   - We had to read a lot of the Storm Spout code to understand how things
   work with leadership and replication. We've not yet updated Storm and had
   to use storm-kafka-0.8-plus
   <https://github.com/wurstmeister/storm-kafka-0.8-plus> since it's not in
   our dist.
      - It's really important that you test your producers and consumers
      with the failure of a leader. The spout seems to throw an uncaught
      exception and crash, but restarts. We at least know this is the behavior
      now and can watch for it and test newer versions of storm.
   - Replication was slower, but well worth it for us. We also had to
   switch from Python's samsa <https://pypi.python.org/pypi/samsa> to
   kafka-python <https://github.com/mumrah/kafka-python>.
      - It's important you grok the sync/async and cluster/local commit
      modes and make deliberate decisions and load test them in your
environment.
      - Our use of samsa was async but I'm not sure anyone knew that. Our
      write times went from ~ 0.5ms to 35ms on average.
   - It's important to verify that your producers are making proper use of
   partitioning so that you don't end up with hot spots in partitions!!
      - We deployed with a few producers with a KeyedProducer but set no
      key. This causes it to pick two partitions only. :)
   - The information in this post
   <http://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/>
   was very helpful.
   - The quoted bean names in JMX are really weird and the structure of the
   metrics is really a pain to fetch with JMX tools. Ugh.

The actual process for us was to set up new Kafka 0.8 machines where we
made good decisions about replication factors and partitioning. We wrote up
a runbook of tasks to perform and it ended up being something like "tear
everything down in this order and stand it back up in Kafka 0.8 after the
old stuff drains."

It went off without a hitch until the latency of writes cross-DC combined
with an accidental code change in the producer. I accidentally started
posting "batch" writes to our API individually instead of sending them in
one write to Kafka. Once we fixed that things were effectively perfect.


On Wed, Aug 20, 2014 at 10:17 AM, Philip O'Toole <philip.oto...@yahoo.com>
wrote:

> Any thoughts on upgrading from 0.7 to 0.8 you'd like to share with the
> community? How did it go? How has it been?
>
> Philip
>
> -----------------------------------------
> http://www.philipotoole.com
>
>
>   On Wednesday, August 20, 2014 10:13 AM, Cory Watson <gp...@keen.io>
> wrote:
>
>
> :P  Thanks Philip!
>
>
> On Wed, Aug 20, 2014 at 10:04 AM, Philip O'Toole <
> philip.oto...@yahoo.com.invalid> wrote:
>
> Nice work. That tool I put together was getting a bit old. :-)
>
>
> I updated the Kafka "ecosystem" page with details of both tools.
>
> https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
>
> Philip
>
>
>
> -----------------------------------------
> http://www.philipotoole.com
>
>
> On Wednesday, August 20, 2014 9:32 AM, Cory Watson <gp...@keen.io> wrote:
>
>
>
> Hello all!
>
> If you use Storm and Kafka 0.8 together I may have an interesting project
> for you.  Recently at Keen IO <https://keen.io/> we upgraded from Kafka
> 0.7
> to 0.8 and needed to replace the features of stormkafkamon
> <https://github.com/otoolep/stormkafkamon> for monitoring spout offsets
> since it only worked with Kafka 0.7. So we created Capillary
> <https://github.com/keenlabs/capillary>!
>
> Given a bit of information
> <https://github.com/keenlabs/capillary#configuration> about your Zookeeper
> setup for Storm and Kafka, Capillary is a web application that
> automatically discovers topologies running Kafka 0.8 spouts and lets you
> click on them to see per-partition offsets
> <https://raw.githubusercontent.com/keenlabs/capillary/master/shot.png>.
> It's JVM based and shouldn't require anything but Java where you run it. We
> decided this was a useful difference over the
> install-on-each-host-with-virtualenv model used by stormkafkamon.
>
> In addition to the helpful UI there is an API that you might use for
> collecting this information and sending it to your stats/observability
> systems for historical or alerting purposes. There's even an example python
> script
> <https://github.com/keenlabs/capillary/blob/master/stats-to-datadog.py> we
> use for reporting to Datadog <http://www.datadoghq.com/>.  It's very
> likely
> we'll move this to a scheduled thread inside Capillary using metrics
> <http://metrics.codahale.com/> and allow you to hook up your own reporter.
>
> I hope other uses of Kafka + Storm find this useful. Feel free to open an
> issue if you have issues setting it up or configuring it, or drop PRs if
> you have additional functionality. Thanks!
>
> (I'll be cross-posting this to the Storm list also!)
>
> --
> Cory Watson
> Principal Infrastructure Engineer // Keen IO
>
>
>
>
> --
> Cory Watson
> Principal Infrastructure Engineer // Keen IO
>
>
>


-- 
Cory Watson
Principal Infrastructure Engineer // Keen IO

Reply via email to