Hi,

You just need to create a shaded jar for SparkRunner and submit it with 
"spark-submit” and CLI options “--deploy-mode cluster --master yarn”.
Also, you need to specify “--runner=SparkRunner --sparkMaster=yarn” as pipeline 
options. 

—
Alexey

> On 26 Oct 2021, at 19:07, Holt Spalding <hspald...@knock.com> wrote:
> 
> Hello,
> 
> I've been having an incredibly difficult time running an apache beam pipeline 
> on a spark cluster in aws EMR. Spark on EMR is configured for yarn, which 
> appears to be the primary source of my issues, the documentation here: 
> https://beam.apache.org/documentation/runners/spark/ 
> <https://beam.apache.org/documentation/runners/spark/>
> only seems to describe how to run beam on a cluster in Standalone mode. I've 
> tried different beam pipeline arguments, but it seems to run in local mode 
> every time after it can't find a spark master url. Has anyone run into this 
> issue, and or does anyone have suggestions or examples of how to get this to 
> work. Thank you for your help. 
> 

  • [Question] Holt Spalding
    • Re: [Question] Alexey Romanenko

Reply via email to