This is a “Technical Vision” paper for the Spark runner, which provides general guidelines to the future development of Spark’s Beam support as part of the Apache Beam (incubating) project. This is our JIRA - https://issues.apache.org/jira/browse/BEAM/component/12328915/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
Generally, I’m currently working on Datasets integration for Batch (to replace RDD) against Spark 1.6, and going towards enhancing Stream processing capabilities with Structured Streaming (2.0) And you’re welcomed to ask those questions at the Apache Beam (incubating) mailing list as well ;) http://beam.incubator.apache.org/mailing_lists/ Thanks, Amit From: Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr<mailto:ovidiu-cristian.ma...@inria.fr>> Date: Tuesday, May 17, 2016 at 12:11 AM To: "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>> Cc: Ovidiu Cristian Marcu <ovidiu21ma...@gmail.com<mailto:ovidiu21ma...@gmail.com>> Subject: Re: What / Where / When / How questions in Spark 2.0 ? Could you please consider a short answer regarding the Apache Beam Capability Matrix todo’s for future Spark 2.0 release [4]? (some related references below [5][6]) Thanks [4] http://beam.incubator.apache.org/capability-matrix/#cap-full-what [5] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 [6] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 On 16 May 2016, at 14:18, Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr<mailto:ovidiu-cristian.ma...@inria.fr>> wrote: Hi, We can see in [2] many interesting (and expected!) improvements (promises) like extended SQL support, unified API (DataFrames, DataSets), improved engine (Tungsten relates to ideas from modern compilers and MPP databases - similar to Flink [3]), structured streaming etc. It seems we somehow assist at a smart unification of Big Data analytics (Spark, Flink - best of two worlds)! How does Spark respond to the missing What/Where/When/How questions (capabilities) highlighted in the unified model Beam [1] ? Best, Ovidiu [1] https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective [2] https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html [3] http://stratosphere.eu/project/publications/