Re: Running a Beam Pipeline on GCP Dataproc Flink Cluster

Paweł Kordek Fri, 07 Feb 2020 07:19:29 -0800

Hi

I had similar use-case recently, and adding a metadata key solved the issue 
https://github.com/GoogleCloudDataproc/initialization-actions/pull/334. You 
keep the original initialization action and add for example (using gcloud) 
'--metadata 
flink-snapshot-url=http://mirrors.up.pt/pub/apache/flink/flink-1.9.1/flink-1.9.1-bin-scala_2.11.tgz'

Cheers
Pawel
________________________________
From: Ismaël Mejía <ieme...@gmail.com>
Sent: Friday, February 7, 2020 2:24 PM
To: Xander Song <iamuuriw...@gmail.com>; user@beam.apache.org 
<user@beam.apache.org>
Cc: u...@flink.apache.org <u...@flink.apache.org>
Subject: Re: Running a Beam Pipeline on GCP Dataproc Flink Cluster

+user@beam.apache.org<mailto:user@beam.apache.org>

On Fri, Feb 7, 2020 at 12:54 AM Xander Song 
<iamuuriw...@gmail.com<mailto:iamuuriw...@gmail.com>> wrote:
I am attempting to run a Beam pipeline on a GCP Dataproc Flink cluster. I have 
followed the instructions at this 
repo<https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/flink>
 to create a Flink cluster on Dataproc using an initialization action. However, 
the resulting cluster uses version 1.5.6 of Flink, and my project requires a 
more recent version (version 1.7, 1.8, or 1.9) for compatibility with 
Beam<https://beam.apache.org/documentation/runners/flink/>.

Inside of the flink.sh script in the linked repo, there is a line for 
installing Flink from a snapshot URL instead of 
apt<https://github.com/GoogleCloudDataproc/initialization-actions/blob/81e453d8f8a036e371e144d5103aaa38ecb2c679/flink/flink.sh#L53>.
 Is this the correct mechanism for installing a different version of Flink 
using the initialization script? If so, how is it meant to be used?

Thank you in advance.

Re: Running a Beam Pipeline on GCP Dataproc Flink Cluster

Reply via email to