Hi, I have a Python Beam job that works on Dataflow but we would like to submit it on a Spark Dataproc cluster with no Flink involvement. I already spent days but failed to figure out how to use PortableRunner with the beam_spark_job_server to submit my Python Beam job to Spark Dataproc. All the Beam docs are about Flink and there is no guideline about Spark with Dataproc. Some relevant questions might be: 1- What's spark-master-url in case of a remote cluster on Dataproc? Is 7077 the master url port? 2- Should we ssh tunnel to sparkMasterUrl port using gcloud compute ssh? 3- What's the environment_type? Can we use DOCKER? Then what's the SDK Harness Configuration? 4- Should we run the job-server outside of the Dataproc cluster or should we run it in the master node?
Thanks, Mahan