Jon Molle via user Wed, 21 Jun 2023 08:31:35 -0700

Hi,

I've been looking at the Spark Portable Runner docs, specifically Java when
possible, and I'm a little confused about the organization. The docs seem
to say that the JobService both submits the code to the linked spark
cluster (described in the master url) and requires you to run a
spark-submit command after on whatever artifacts it builds.


Unfortunately I'm not that familiar with Spark generally, so I'm probably
misunderstanding more here, but the job server images either totally lack
documentation or just repeat the spark runner page in the main docs.

For context, I'm trying to port some code that we're currently running on a
Dataflow runner (on GCP) to also run on AWS. A spark cluster on EKS (either
self-managed or potentially through EMR, but likely not based on what I am
reading into the docs and some brief testing) seems the closest analog.

The new Tour does the same thing, in addition to only really having
examples for python and a few more typos. I haven't found any existing
questions like this elsewhere, so I assume that I'm just missing something
that should be obvious.

Thanks for your time.

Reply via email to