Hi, So I am working on a project where we might end up having a bunch of decoupled logic components that have to run inside spark streaming. We are using KAFKA as the source of streaming data. My first question is; is it better to chain these logics together by applying transforms to a single rdd or say transforming and writing back to KAFKA and consuming this in another stream and applying more logic. The benfit of the second approach is that it is more decoupled.
Another question would be is what the best practice to have one huge spark streaming job with a bunch of subscriptions and transform chains? Or should I group this into a bunch of jobs with some logical paritioning? Any idea what the performance drawbacks would be in any case? I know this is a broadish question, but help would be greatly appreciated. Arti -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-many-subscriptions-vs-many-jobs-tp24862.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org