RE: What is the real difference between Kafka streaming and Spark Streaming?

Mohammed Guller Sun, 11 Jun 2017 12:42:07 -0700

Just to elaborate more on Vincent wrote – Kafka streaming provides true 
record-at-a-time processing capabilities whereas Spark Streaming provides 
micro-batching capabilities on top of Spark. Depending on your use case, you 
may find one better than the other. Both provide stateless ad stateful stream 
processing capabilities.


A few more things to consider:

  1.  If you don’t already have a Spark cluster, but have Kafka cluster, it may 
be easier to use Kafka streaming since you don’t need to setup and manage 
another cluster.
  2.  On the other hand, if you already have a spark cluster, but don’t have a 
Kafka cluster (in case you are using some other messaging system), Spark 
streaming is a better option.
  3.  If you already know and use Spark, you may find it easier to program with 
Spark Streaming API even if you are using Kafka.
  4.  Spark Streaming may give you better throughput. So you have to decide 
what is more important for your stream processing application – latency or 
throughput?
  5.  Kafka streaming is relatively new and less mature than Spark Streaming

Mohammed

From: vincent gromakowski [mailto:[email protected]]
Sent: Sunday, June 11, 2017 12:09 PM
To: yohann jardin <[email protected]>
Cc: kant kodali <[email protected]>; vaquar khan <[email protected]>; user 
<[email protected]>
Subject: Re: What is the real difference between Kafka streaming and Spark 
Streaming?

I think Kafka streams is good when the processing of each row is independant 
from each other (row parsing, data cleaning...)
Spark is better when processing group of rows (group by, ml, window func...)

Le 11 juin 2017 8:15 PM, "yohann jardin" 
<[email protected]<mailto:[email protected]>> a écrit :

Hey,
Kafka can also do streaming on its own: 
https://kafka.apache.org/documentation/streams
I don’t know much about it unfortunately. I can only repeat what I heard in 
conferences, saying that one should give a try to Kafka streaming when its 
whole pipeline is using Kafka. I have no pros/cons to argument on this topic.

Yohann Jardin
Le 6/11/2017 à 7:08 PM, vaquar khan a écrit :

Hi Kant,

Kafka is the message broker that using as Producers and Consumers and Spark 
Streaming is used as the real time processing ,Kafka and Spark Streaming work 
together not competitors.
Spark Streaming is reading data from Kafka and process into micro batching for 
streaming data, In easy terms collects data for some time, build RDD and then 
process these micro batches.


Please read doc : 
https://spark.apache.org/docs/latest/streaming-programming-guide.html


Spark Streaming is an extension of the core Spark API that enables scalable, 
high-throughput, fault-tolerant stream processing of live data streams. Data 
can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, 
and can be processed using complex algorithms expressed with high-level 
functions like map, reduce, join and window. Finally, processed data can be 
pushed out to filesystems, databases, and live dashboards. In fact, you can 
apply Spark’s machine 
learning<https://spark.apache.org/docs/latest/ml-guide.html> and graph 
processing<https://spark.apache.org/docs/latest/graphx-programming-guide.html> 
algorithms on data streams.


Regards,

Vaquar khan

On Sun, Jun 11, 2017 at 3:12 AM, kant kodali 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,

I am trying hard to figure out what is the real difference between Kafka 
Streaming vs Spark Streaming other than saying one can be used as part of Micro 
services (since Kafka streaming is just a library) and the other is a 
Standalone framework by itself.

If I can accomplish same job one way or other this is a sort of a puzzling 
question for me so it would be great to know what Spark streaming can do that 
Kafka Streaming cannot do efficiently or whatever ?

Thanks!




--
Regards,
Vaquar Khan
+1 -224-436-0783<tel:(224)%20436-0783>
Greater Chicago

RE: What is the real difference between Kafka streaming and Spark Streaming?

Reply via email to