Re: Appropriate Spark Cluster Mode for running Beam SparkRunner apps

Amit Sela Wed, 28 Sep 2016 14:01:46 -0700

I know of people running Standalone in production as well (not sure how it
scales), and to the best of my knowledge it also has a cluster mode. I only
said that I *personally* have experience with Standalone fort testing, and
YARN for production.


The main issue with the SparkRunner's README is that it relates to batch
over HDFS which is a WIP in Beam (not specific a runner). The runner will
support batch in full once https://issues.apache.org/jira/browse/BEAM-259 is
pushed.

As for streaming input:
Kafka is supported via Spark's custom KafkaIO, but there is a WIP on
replacing it with a fully Beam-compliant KafkaIO.
Event-time windows, triggers and accumulation modes - I'm working on
something but Spark 1.x on it's own does not support those so the behaviour
is currently Spark native behaviour which could be described as
"processing-time trigger, with discarding panes".

Hope this helps,
Amit

On Wed, Sep 28, 2016 at 11:47 PM amir bahmanyari <[email protected]>
wrote:

> Sure...Thanks Amit.
> So basically: Standard for testing & YARN for Production?
> Yes, README for SparkRunner is way outdated. the FlinkRunner version is
> very informative.
> In the meanwhile the README is in progress, could you give me some helpful
> details so I do the perf testing in the right context pls?
> Have a great day
> Amir-
> ------------------------------
> *From:* Amit Sela <[email protected]>
> *To:* amir bahmanyari <[email protected]>; "
> [email protected]" <[email protected]>
> *Sent:* Wednesday, September 28, 2016 1:13 PM
> *Subject:* Re: Appropriate Spark Cluster Mode for running Beam
> SparkRunner apps
>
> Hi Amir,
>
> The Beam SparkRunner basically translates the Beam pipeline into a Spark
> job, so it's not much different then a common Spark job.
> I can personally say that I'm running both in Standalone (mostly testing)
> and YARN. I don't have much experience with Spark over Mesos in general
> though.
>
> As for running over YARN, you can simply use the "spark-submit" script
> supplied with the Spark installation, and the runner will pick-up the
> necessary (Spark) configurations, such as "--master yarn".
>
> The SparkRunner README is not up-to-date right now, and I will patch it up
> soon, I'm also working on some improvements and new features for the runner
> as well, so stay tuned!
>
> Thanks,
> Amit
>
> On Wed, Sep 28, 2016 at 10:46 PM amir bahmanyari <[email protected]>
> wrote:
>
> Hi Colleagues,
> I am in progress setting up Spark Cluster for running Beam SparkRunner
> apps.
> The objective is to collect performance matrices via bench-marking
> techniques.
> The Spark docs suggest the following Clustering types.
> Which one is the most appropriate type when it comes to performance
> testing Beam SparkRunner?
> Thanks+regards
> Amir
>
>
> [image: Inline image]
>
>
>
>

Re: Appropriate Spark Cluster Mode for running Beam SparkRunner apps

Reply via email to