Re: Getting Beam(Python)-on-Flink-on-k8s to work

Eugene Kirpichov Thu, 27 Aug 2020 10:26:31 -0700

Hi Kyle,

Thanks for the response!


On Wed, Aug 26, 2020 at 5:28 PM Kyle Weaver <kcwea...@google.com> wrote:

> > - With the Flink operator, I was able to submit a Beam job, but hit the
> issue that I need Docker installed on my Flink nodes. I haven't yet tried
> changing the operator's yaml files to add Docker inside them.
>
> Running Beam workers via Docker on the Flink nodes is not recommended (and
> probably not even possible), since the Flink nodes are themselves already
> running inside Docker containers. Running workers as sidecars avoids that
> problem. For example:
> https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/examples/beam/with_job_server/beam_flink_cluster.yaml#L17-L20
>
> The main problem with the sidecar approach is that I can't use the Flink
cluster as a "service" for anybody to submit their jobs with custom
containers - the container version is fixed.
Do I understand it correctly?
Seems like the Docker-in-Docker approach is viable, and is mentioned
in the Beam
Flink K8s design doc
<https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.dtj1gnks47dq>
.


> > I also haven't tried this
> <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md>
>  yet
> because it implies submitting jobs using "kubectl apply"  which is weird -
> why not just submit it through the Flink job server?
>
> I'm guessing it goes through k8s for monitoring purposes. I see no reason
> it shouldn't be possible to submit to the job server directly through
> Python, network permitting, though I haven't tried this.
>
>
>
> On Wed, Aug 26, 2020 at 4:10 PM Eugene Kirpichov <ekirpic...@gmail.com>
> wrote:
>
>> Hi folks,
>>
>> I'm still working with Pachama <https://pachama.com/> right now; we have
>> a Kubernetes Engine cluster on GCP and want to run Beam Python batch
>> pipelines with custom containers against it.
>> Flink and Cloud Dataflow are the two options; Cloud Dataflow doesn't
>> support custom containers for batch pipelines yes so we're going with Flink.
>>
>> I'm struggling to find complete documentation on how to do this. There
>> seems to be lots of conflicting or incomplete information: several ways to
>> deploy Flink, several ways to get Beam working with it, bizarre
>> StackOverflow questions, and no documentation explaining a complete working
>> example.
>>
>> == My requests ==
>> * Could people briefly share their working setup? Would be good to know
>> which directions are promising.
>> * It would be particularly helpful if someone could volunteer an hour of
>> their time to talk to me about their working Beam/Flink/k8s setup. It's for
>> a good cause (fixing the planet :) ) and on my side I volunteer to write up
>> the findings to share with the community so others suffer less.
>>
>> == Appendix: My findings so far ==
>> There are multiple ways to deploy Flink on k8s:
>> - The GCP marketplace Flink operator
>> <https://cloud.google.com/blog/products/data-analytics/open-source-processing-engines-for-kubernetes>
>>  (couldn't
>> get it to work) and the respective CLI version
>> <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator> (buggy,
>> but I got it working)
>> - https://github.com/lyft/flinkk8soperator (haven't tried)
>> - Flink's native k8s support
>> <https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html>
>>  (super
>> easy to get working)
>> <https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html>
>>
>> I confirmed that my Flink cluster was operational by running a simple
>> Wordcount job, initiated from my machine. However I wasn't yet able to get
>> Beam working:
>>
>> - With the Flink operator, I was able to submit a Beam job, but hit the
>> issue that I need Docker installed on my Flink nodes. I haven't yet tried
>> changing the operator's yaml files to add Docker inside them. I also
>> haven't tried this
>> <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md>
>> yet because it implies submitting jobs using "kubectl apply"  which is
>> weird - why not just submit it through the Flink job server?
>>
>> - With Flink's native k8s support, I tried two things:
>>   - Creating a fat portable jar using  --output_executable_path. The jar
>> is huge (200+MB) and takes forever to upload to my Flink cluster - this is
>> a non-starter. But if I actually upload it, then I hit the same issue with
>> lacking Docker. Haven't tried fixing it yet.
>>   - Simply running my pipeline --runner=FlinkRunner
>> --environment_type=DOCKER --flink_master=$PUBLIC_IP:8081. The Java process
>> appears to send 1+GB of data to somewhere, but the job never even starts.
>>
>> I looked at a few conference talks:
>> -
>> https://www.cncf.io/wp-content/uploads/2020/02/CNCF-Webinar_-Apache-Flink-on-Kubernetes-Operator-1.pdf
>> - seems to imply that I need to add a Beam worker "sidecar" to the Flink
>> workers; and that I need to submit my job using "kubectl apply".
>> - https://www.youtube.com/watch?v=8k1iezoc5Sc which also mentions the
>> sidecar, but also mentions the fat jar option
>>
>> --
>> Eugene Kirpichov
>> http://www.linkedin.com/in/eugenekirpichov
>>
>

-- 
Eugene Kirpichov
http://www.linkedin.com/in/eugenekirpichov

Re: Getting Beam(Python)-on-Flink-on-k8s to work

Reply via email to