Re: Running Beam python pipeline on Spark

2020-06-03 Thread Xinyu Liu
Thanks for the pointers, Thomas. Let me give it a shot tomorrow. Thanks, Xinyu On Wed, Jun 3, 2020 at 7:13 PM Thomas Weise wrote: > If all Python dependencies are pre-installed on the yarn container hosts, > then you can use the process environment to spawn processes, like so: > > > https://git

Re: Running Beam python pipeline on Spark

2020-06-03 Thread Thomas Weise
If all Python dependencies are pre-installed on the yarn container hosts, then you can use the process environment to spawn processes, like so: https://github.com/lyft/flinkk8soperator/blob/bb8834d69e8621d636ef2085fdc167a9d2c2bfa3/examples/beam-python/src/beam_example/pipeline.py#L16-L17 Thomas

Running Beam python pipeline on Spark

2020-06-03 Thread Xinyu Liu
Hi, folks, I am trying to do some experiment to run a simple "hello world" python pipeline on a remote Spark cluster on Hadoop. So far I ran the SparkJobServerDriver on the Yarn application master and managed to submit a python pipeline to it. SparkPipelineRunner was able to run the portable pipel