Re: Kafka Streams vs Spark Streaming

Guozhang Wang Sat, 25 Feb 2017 10:49:55 -0800

Hello Kohki,

Thanks for the email. I'd like to learn what's your concern of the size of
the state store? From your description it's a bit hard to figure out but
I'd guess you have lots of state stores while each of them are relatively
small?


Hello Tianji,

Regarding your question about maturity and users of Streams, you can take a
look at a bunch of the blog posts written about their Streams usage in
production, for example:

http://engineering.skybettingandgaming.com/2017/01/23/streaming-architectures/

http://developers.linecorp.com/blog/?p=3960

Guozhang


On Sat, Feb 25, 2017 at 7:52 AM, Kohki Nishio <tarop...@gmail.com> wrote:

> I did a bit of research on that matter recently, the comparison is between
> Spark Structured Streaming(SSS) and Kafka Streams,
>
> Both are relatively new (~1y) and trying to solve similar problems, however
> if you go with Spark, you have to go with a cluster, if your environment
> already have a cluster, then it's good. However our team doesn't do any
> Spark, so the initial cost would be very high. On the other hand, Kafka
> Streams is a java library, since we have a service framework, doing stream
> inside a service is super easy.
>
> However for some reason, people see SSS is more mature and Kafka Streams is
> not so mature (like Beta). But old fashion stream is both mature enough (in
> my opinion), I didn't see any difference in DStream(Spark) and
> KStream(Kafka)
>
> DataFrame (Structured Streaming) and KTable, I found it quite different.
> Kafka's model is more like a change log, that means you need to see the
> latest entry to make a final decision. I would call this as 'Update' model,
> whereas Spark does 'Append' model and it doesn't support 'Update' model
> yet. (it's coming to 2.2)
>
> http://spark.apache.org/docs/latest/structured-streaming-pro
> gramming-guide.html#output-modes
>
> I wanted to have 'Append' model with Kafka, but it seems it's not easy
> thing to do, also Kafka Streams uses an internal topic to keep state
> changes for fail-over scenario, but I'm dealing with a lots of tiny
> information and I have a big concern about the size of the state store /
> topic, so my decision is that I'm going with my own handling of Kafka API
> ..
>
> If you do stateless operation and don't have a spark cluster, yeah Kafka
> Streams is perfect.
> If you do stateful complicated operation and happen to have a spark
> cluster, give Spark a try
> else you have to write a code which is optimized for your use case
>
>
> thanks
> -Kohki
>
>
>
>
> On Fri, Feb 24, 2017 at 6:22 PM, Tianji Li <skyah...@gmail.com> wrote:
>
> > Hi there,
> >
> > Can anyone give a good explanation in what cases Kafka Streams is
> > preferred, and in what cases Sparking Streaming is better?
> >
> > Thanks
> > Tianji
> >
>
>
>
> --
> Kohki Nishio
>



-- 
-- Guozhang

Re: Kafka Streams vs Spark Streaming

Reply via email to