Hi, You just need to create a shaded jar for SparkRunner and submit it with "spark-submit” and CLI options “--deploy-mode cluster --master yarn”. Also, you need to specify “--runner=SparkRunner --sparkMaster=yarn” as pipeline options.
— Alexey > On 26 Oct 2021, at 19:07, Holt Spalding <hspald...@knock.com> wrote: > > Hello, > > I've been having an incredibly difficult time running an apache beam pipeline > on a spark cluster in aws EMR. Spark on EMR is configured for yarn, which > appears to be the primary source of my issues, the documentation here: > https://beam.apache.org/documentation/runners/spark/ > <https://beam.apache.org/documentation/runners/spark/> > only seems to describe how to run beam on a cluster in Standalone mode. I've > tried different beam pipeline arguments, but it seems to run in local mode > every time after it can't find a spark master url. Has anyone run into this > issue, and or does anyone have suggestions or examples of how to get this to > work. Thank you for your help. >