>From the performance and scalability standpoint, is it better to plug in, say a multi-threaded pipeliner into a Spark job, or implement pipelining via Spark's own transformation mechanisms such as e.g. map or filter?
I'm seeing some reference architectures where things like 'morphlines' are plugged into Spark but it'd seem that Spark may yield better performance and scalability if each stage within a pipeline is a function in a Spark job - ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pipelining-with-Spark-tp22976.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org