Hello!

> to also run on AWS

> A spark cluster on EKS seems the closest analog

There's another way of running Beam apps in AWS -
https://aws.amazon.com/kinesis/data-analytics/ - which is basically
"serverless" Flink. It says Kinesis, but you can run any Flink / Beam job
there, you don't have to use Kinesis streams. I used KDA in multiple
projects so far, works OK. FlinkRunner also seems to have more docs as far
as I can see.

Here's a pom.xml example:
https://github.com/aws-samples/amazon-kinesis-data-analytics-examples/blob/master/Beam/pom.xml

Best Regards,
Pavel Solomin

Tel: +351 962 950 692 | Skype: pavel_solomin | Linkedin
<https://www.linkedin.com/in/pavelsolomin>





On Wed, 21 Jun 2023 at 16:31, Jon Molle via user <user@beam.apache.org>
wrote:

> Hi,
>
> I've been looking at the Spark Portable Runner docs, specifically Java
> when possible, and I'm a little confused about the organization. The docs
> seem to say that the JobService both submits the code to the linked spark
> cluster (described in the master url) and requires you to run a
> spark-submit command after on whatever artifacts it builds.
>
> Unfortunately I'm not that familiar with Spark generally, so I'm probably
> misunderstanding more here, but the job server images either totally lack
> documentation or just repeat the spark runner page in the main docs.
>
> For context, I'm trying to port some code that we're currently running on
> a Dataflow runner (on GCP) to also run on AWS. A spark cluster on EKS
> (either self-managed or potentially through EMR, but likely not based on
> what I am reading into the docs and some brief testing) seems the closest
> analog.
>
> The new Tour does the same thing, in addition to only really having
> examples for python and a few more typos. I haven't found any existing
> questions like this elsewhere, so I assume that I'm just missing something
> that should be obvious.
>
> Thanks for your time.
>

Reply via email to