Re: Kafka + Storm

Corey Nolet Thu, 14 Aug 2014 20:55:56 -0700

Kafka is also distributed in nature, which is not something easily achieved
by queuing brokers like ActiveMQ or JMS (1.0) in general. Kafka allows data
to be partitioned across many machines which can grow as necessary as your
data grows.





On Thu, Aug 14, 2014 at 11:20 PM, Justin Workman <justinjwork...@gmail.com>
wrote:

> Absolutely!
>
> Sent from my iPhone
>
> On Aug 14, 2014, at 9:02 PM, anand nalya <anand.na...@gmail.com> wrote:
>
> I agree, not for the long run but for small bursts in data production
> rate, say peak hours, Kafka can help in providing a somewhat consistent
> load on Storm cluster.
> ------------------------------
> From: Justin Workman <justinjwork...@gmail.com>
> Sent: ‎15-‎08-‎2014 07:53
> To: user@storm.incubator.apache.org
> Subject: Re: Kafka + Storm
>
> I suppose not directly.  It depends on the lifetime of your Kafka queues
> and on your latency requirements. You need to make sure you have enough
> "doctors" or in storm language workers, in your storm cluster to process
> your messages within your SLA.
>
> For our case we, we have a 3 hour lifetime or ttl configured for our
> queues. Meaning records in the queue older than 3 hours are purged. We also
> have an internal SLA ( team goal, not published to the business ;)) of 10
> seconds from event to end of stream and available for end user consumption.
>
> So we need to make sure we have enough storm workers to to meet; 1) the
> normal SLA and 2) be able to "catch up" on the queues when we have to take
> storm down for maintenance and such and the queues build.
>
> There are many knobs you can tune for both storm and Kafka. We have spent
> many hours tuning things to meet our SLAs.
>
> Justin
>
> Sent from my iPhone
>
> On Aug 14, 2014, at 8:05 PM, anand nalya <anand.na...@gmail.com> wrote:
>
> Also, since Kafka acts as a buffer, storm is not directly affected by the
> speed of your data sources/producers.
> ------------------------------
> From: Justin Workman <justinjwork...@gmail.com>
> Sent: ‎15-‎08-‎2014 07:12
> To: user@storm.incubator.apache.org
> Subject: Re: Kafka + Storm
>
> Good analogy!
>
> Sent from my iPhone
>
> On Aug 14, 2014, at 7:36 PM, "Adaryl \"Bob\" Wakefield, MBA" <
> adaryl.wakefi...@hotmail.com> wrote:
>
>  Ah so Storm is the hospital and Kafka is the waiting room where
> everybody queues up to be seen in turn yes?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
>  *From:* Justin Workman <justinjwork...@gmail.com>
> *Sent:* Thursday, August 14, 2014 7:47 PM
> *To:* user@storm.incubator.apache.org
> *Subject:* Re: Kafka + Storm
>
>  If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see
> if I can explain, I am definitely not a subject matter expert on this.
>
> Within Kafka you can create "queues", ie a webclicks queue. Your web
> servers can then send click events to this queue in Kafka. The web servers,
> or agent writing the events to this queue are referred to as the
> "producer".  Each event, or message in Kafka is assigned an id.
>
> On the other side there are "consumers", in storms case this would be the
> storm Kafka spout, that can subscribe to this webclicks queue to consume
> the messages that are in the queue. The consumer can consume a single
> message from the queue, or a batch of messages, as storm does. The consumer
> keeps track of the latest offset, Kafka message id, that it has consumed.
> This way the next time the consumer checks to see if there are more
> messages to consume it will ask for messages with a message id greater than
> its last offset.
>
> This helps with the reliability of the event stream and helps guarantee
> that your events/message make it start to finish through your stream,
> assuming the events get to Kafka ;)
>
> Hope this helps and makes some sort of sense. Again, sent from my iPhone ;)
>
> Justin
>
> Sent from my iPhone
>
> On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" <
> adaryl.wakefi...@hotmail.com> wrote:
>
>   I get your reasoning at a high level. I should have specified that I
> wasn’t sure what Kafka does. I don’t have a hard software engineering
> background. I know that Kafka is “a message queuing” system, but I don’t
> really know what that means.
>
> (I can’t believe you wrote all that from your iPhone....)
> B.
>
>
>  *From:* Justin Workman <justinjwork...@gmail.com>
> *Sent:* Thursday, August 14, 2014 7:22 PM
> *To:* user@storm.incubator.apache.org
> *Subject:* Re: Kafka + Storm
>
>  Personally, we looked at several options, including writing our own
> storm source. There are limited storm sources with community support out
> there. For us, it boiled down to the following;
>
> 1) community support and what appeared to be a standard method. Storm has
> now included the kafka source as a bundled component to storm. This made
> the implementation much faster, because the code was done.
> 2) the durability (replication and clustering) of Kafka. We have a three
> hour retention period on our queues, so if we need to do maintenance on
> storm or deploy an updated topology, we don't need to stop or replay any
> sources
> 3) the ability to have other tools attach to the Kafka queues to consume
> the same events for other purposes.
> 4) to compliment point #1, it's easy to write to Kafka. So it was little
> effort to start sending our desired data to Kafka.
>
> These are our main reasons ( I'm sure there were more ). Each use case is
> going to be different and Kafka might not be the best choice for everyone.
> For us it made sense.
>
> Justin
>
> Sent from my iPhone
>
> On Aug 14, 2014, at 6:08 PM, "Adaryl \"Bob\" Wakefield, MBA" <
> adaryl.wakefi...@hotmail.com> wrote:
>
>   Can someone tell me why people put Kafka in front of Storm? Can’t Storm
> ingest messages without having Kafka in the middle?
>
> B.
>
>

Re: Kafka + Storm

Reply via email to