Spark Streaming many subscriptions vs many jobs

Arttii Tue, 29 Sep 2015 04:43:27 -0700

Hi,

So I am working on a project where we might end up having a bunch of
decoupled logic components that have to run inside spark streaming. We are
using KAFKA as the source of streaming data.
My first question is; is it better to chain these logics together by
applying transforms to a single rdd or say transforming and writing back to
KAFKA and consuming this in another stream and applying more logic. The
benfit of the second approach is that it is more decoupled.


Another question would be is what the best practice to have one huge spark
streaming job with a bunch of subscriptions and transform chains? Or should
I group this into a bunch of jobs with some logical paritioning? 

Any idea what the performance drawbacks would be in any case? I know this is
a broadish question, but help would be greatly appreciated.

Arti



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-many-subscriptions-vs-many-jobs-tp24862.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming many subscriptions vs many jobs

Reply via email to