Re: Getting Beam(Python)-on-Flink-on-k8s to work

Eugene Kirpichov Fri, 28 Aug 2020 10:22:39 -0700

Holy shit, thanks Sam, this is more help than I could have asked for!!
I'll give this a shot later today and report back.


On Thu, Aug 27, 2020 at 10:27 PM Sam Bourne <samb...@gmail.com> wrote:

> Hi Eugene!
>
> I’m struggling to find complete documentation on how to do this. There
> seems to be lots of conflicting or incomplete information: several ways to
> deploy Flink, several ways to get Beam working with it, bizarre
> StackOverflow questions, and no documentation explaining a complete working
> example.
>
> This *is* possible and I went through all the same frustrations of sparse
> and confusing documentation. I’m glossing over a lot of details, but the
> key thing was setting up the flink taskworker(s) to run docker. This
> requires running docker-in-docker as the taskworker itself is a docker
> container in k8s.
>
> First create a custom flink container with docker:
>
> # docker-flink Dockerfile
>
> FROM flink:1.10
> # install docker
> RUN apt-get ...
>
> Then setup the taskmanager deployment to use a sidecar docker-in-docker
> service. This dind service is where the python sdk harness container
> actually runs.
>
> kind: Deployment
> ...
>   containers:
>   - name: docker
>     image: docker:19.03.5-dind
>     ...
>   - name: taskmanger
>     image: myregistry:5000/docker-flink:1.10
>     env:
>     - name: DOCKER_HOST
>       value: tcp://localhost:2375
> ...
>
> I quickly threw all these pieces together in a repo here:
> https://github.com/sambvfx/beam-flink-k8s
>
> I added a working (via minikube) step-by-step in the README to prove to
> myself that I didn’t miss anything, but feel free to submit any PRs if you
> want to add anything useful.
>
> The documents you linked are very informative. It would be great to
> aggregate all this into digestible documentation. Let me know if you have
> any further questions!
>
> Cheers,
> Sam
>
> On Thu, Aug 27, 2020 at 10:25 AM Eugene Kirpichov <ekirpic...@gmail.com>
> wrote:
>
>> Hi Kyle,
>>
>> Thanks for the response!
>>
>> On Wed, Aug 26, 2020 at 5:28 PM Kyle Weaver <kcwea...@google.com> wrote:
>>
>>> > - With the Flink operator, I was able to submit a Beam job, but hit
>>> the issue that I need Docker installed on my Flink nodes. I haven't yet
>>> tried changing the operator's yaml files to add Docker inside them.
>>>
>>> Running Beam workers via Docker on the Flink nodes is not recommended
>>> (and probably not even possible), since the Flink nodes are themselves
>>> already running inside Docker containers. Running workers as sidecars
>>> avoids that problem. For example:
>>> https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/examples/beam/with_job_server/beam_flink_cluster.yaml#L17-L20
>>>
>>> The main problem with the sidecar approach is that I can't use the Flink
>> cluster as a "service" for anybody to submit their jobs with custom
>> containers - the container version is fixed.
>> Do I understand it correctly?
>> Seems like the Docker-in-Docker approach is viable, and is mentioned in
>> the Beam Flink K8s design doc
>> <https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.dtj1gnks47dq>
>> .
>>
>>
>>> > I also haven't tried this
>>> <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md>
>>>  yet
>>> because it implies submitting jobs using "kubectl apply"  which is weird -
>>> why not just submit it through the Flink job server?
>>>
>>> I'm guessing it goes through k8s for monitoring purposes. I see no
>>> reason it shouldn't be possible to submit to the job server directly
>>> through Python, network permitting, though I haven't tried this.
>>>
>>>
>>>
>>> On Wed, Aug 26, 2020 at 4:10 PM Eugene Kirpichov <ekirpic...@gmail.com>
>>> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> I'm still working with Pachama <https://pachama.com/> right now; we
>>>> have a Kubernetes Engine cluster on GCP and want to run Beam Python batch
>>>> pipelines with custom containers against it.
>>>> Flink and Cloud Dataflow are the two options; Cloud Dataflow doesn't
>>>> support custom containers for batch pipelines yes so we're going with 
>>>> Flink.
>>>>
>>>> I'm struggling to find complete documentation on how to do this. There
>>>> seems to be lots of conflicting or incomplete information: several ways to
>>>> deploy Flink, several ways to get Beam working with it, bizarre
>>>> StackOverflow questions, and no documentation explaining a complete working
>>>> example.
>>>>
>>>> == My requests ==
>>>> * Could people briefly share their working setup? Would be good to know
>>>> which directions are promising.
>>>> * It would be particularly helpful if someone could volunteer an hour
>>>> of their time to talk to me about their working Beam/Flink/k8s setup. It's
>>>> for a good cause (fixing the planet :) ) and on my side I volunteer to
>>>> write up the findings to share with the community so others suffer less.
>>>>
>>>> == Appendix: My findings so far ==
>>>> There are multiple ways to deploy Flink on k8s:
>>>> - The GCP marketplace Flink operator
>>>> <https://cloud.google.com/blog/products/data-analytics/open-source-processing-engines-for-kubernetes>
>>>>  (couldn't
>>>> get it to work) and the respective CLI version
>>>> <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator> (buggy,
>>>> but I got it working)
>>>> - https://github.com/lyft/flinkk8soperator (haven't tried)
>>>> - Flink's native k8s support
>>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html>
>>>>  (super
>>>> easy to get working)
>>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html>
>>>>
>>>> I confirmed that my Flink cluster was operational by running a simple
>>>> Wordcount job, initiated from my machine. However I wasn't yet able to get
>>>> Beam working:
>>>>
>>>> - With the Flink operator, I was able to submit a Beam job, but hit the
>>>> issue that I need Docker installed on my Flink nodes. I haven't yet tried
>>>> changing the operator's yaml files to add Docker inside them. I also
>>>> haven't tried this
>>>> <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md>
>>>> yet because it implies submitting jobs using "kubectl apply"  which is
>>>> weird - why not just submit it through the Flink job server?
>>>>
>>>> - With Flink's native k8s support, I tried two things:
>>>>   - Creating a fat portable jar using  --output_executable_path. The
>>>> jar is huge (200+MB) and takes forever to upload to my Flink cluster - this
>>>> is a non-starter. But if I actually upload it, then I hit the same issue
>>>> with lacking Docker. Haven't tried fixing it yet.
>>>>   - Simply running my pipeline --runner=FlinkRunner
>>>> --environment_type=DOCKER --flink_master=$PUBLIC_IP:8081. The Java process
>>>> appears to send 1+GB of data to somewhere, but the job never even starts.
>>>>
>>>> I looked at a few conference talks:
>>>> -
>>>> https://www.cncf.io/wp-content/uploads/2020/02/CNCF-Webinar_-Apache-Flink-on-Kubernetes-Operator-1.pdf
>>>> - seems to imply that I need to add a Beam worker "sidecar" to the Flink
>>>> workers; and that I need to submit my job using "kubectl apply".
>>>> - https://www.youtube.com/watch?v=8k1iezoc5Sc which also mentions the
>>>> sidecar, but also mentions the fat jar option
>>>>
>>>> --
>>>> Eugene Kirpichov
>>>> http://www.linkedin.com/in/eugenekirpichov
>>>>
>>>
>>
>> --
>> Eugene Kirpichov
>> http://www.linkedin.com/in/eugenekirpichov
>>
>

-- 
Eugene Kirpichov
http://www.linkedin.com/in/eugenekirpichov

Re: Getting Beam(Python)-on-Flink-on-k8s to work

Reply via email to