I know of people running Standalone in production as well (not sure how it scales), and to the best of my knowledge it also has a cluster mode. I only said that I *personally* have experience with Standalone fort testing, and YARN for production.
The main issue with the SparkRunner's README is that it relates to batch over HDFS which is a WIP in Beam (not specific a runner). The runner will support batch in full once https://issues.apache.org/jira/browse/BEAM-259 is pushed. As for streaming input: Kafka is supported via Spark's custom KafkaIO, but there is a WIP on replacing it with a fully Beam-compliant KafkaIO. Event-time windows, triggers and accumulation modes - I'm working on something but Spark 1.x on it's own does not support those so the behaviour is currently Spark native behaviour which could be described as "processing-time trigger, with discarding panes". Hope this helps, Amit On Wed, Sep 28, 2016 at 11:47 PM amir bahmanyari <[email protected]> wrote: > Sure...Thanks Amit. > So basically: Standard for testing & YARN for Production? > Yes, README for SparkRunner is way outdated. the FlinkRunner version is > very informative. > In the meanwhile the README is in progress, could you give me some helpful > details so I do the perf testing in the right context pls? > Have a great day > Amir- > ------------------------------ > *From:* Amit Sela <[email protected]> > *To:* amir bahmanyari <[email protected]>; " > [email protected]" <[email protected]> > *Sent:* Wednesday, September 28, 2016 1:13 PM > *Subject:* Re: Appropriate Spark Cluster Mode for running Beam > SparkRunner apps > > Hi Amir, > > The Beam SparkRunner basically translates the Beam pipeline into a Spark > job, so it's not much different then a common Spark job. > I can personally say that I'm running both in Standalone (mostly testing) > and YARN. I don't have much experience with Spark over Mesos in general > though. > > As for running over YARN, you can simply use the "spark-submit" script > supplied with the Spark installation, and the runner will pick-up the > necessary (Spark) configurations, such as "--master yarn". > > The SparkRunner README is not up-to-date right now, and I will patch it up > soon, I'm also working on some improvements and new features for the runner > as well, so stay tuned! > > Thanks, > Amit > > On Wed, Sep 28, 2016 at 10:46 PM amir bahmanyari <[email protected]> > wrote: > > Hi Colleagues, > I am in progress setting up Spark Cluster for running Beam SparkRunner > apps. > The objective is to collect performance matrices via bench-marking > techniques. > The Spark docs suggest the following Clustering types. > Which one is the most appropriate type when it comes to performance > testing Beam SparkRunner? > Thanks+regards > Amir > > > [image: Inline image] > > > >
