Hi Andrew, >> By pip installing apache-flink, this docker image will have the flink distro installed at /opt/flink and FLINK_HOME set to /opt/flink <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>. BUT ALSO flink lib jars will be installed at e.g. /usr/local/lib/python3.7/dist-packages/pyflink/lib! So, by following those instructions, flink is effectively installed twice into the docker image.
Yes, your understanding is correct. The base image `flink:1.15.2` doesn't include PyFlink and so you need to build a custom image if you want to use PyFlink. Regarding to the jar packages which are installed twice, you could remove the JAR packages located under /usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip install apache-flink`. It will use the JAR packages located under $FLINK_HOME/lib. >> Is using pyflink from the flink distribution tarball (without pip) not a supported way to use pyflink? You are right. Regards, Dian On Thu, Jan 26, 2023 at 11:12 PM Andrew Otto <o...@wikimedia.org> wrote: > Ah, oops and my original email had a typo: > > Some python dependencies are not included in the flink distribution > tarballs: cloudpickle, py4j and pyflink are in opt/python. > > Should read: > > Some python dependencies ARE included in the flink distribution > tarballs: cloudpickle, py4j and pyflink are in opt/python. > > On Thu, Jan 26, 2023 at 10:10 AM Andrew Otto <o...@wikimedia.org> wrote: > >> Let me ask a related question: >> >> We are building our own base Flink docker image. We will be deploying >> both JVM and python apps via flink-kubernetes-operator. >> >> Is there any reason not to install Flink in this image via `pip install >> apache-flink` and use it for JVM apps? >> >> -Andrew Otto >> Wikimedia Foundation >> >> >> >> On Tue, Jan 24, 2023 at 4:26 PM Andrew Otto <o...@wikimedia.org> wrote: >> >>> Hello, >>> >>> I'm having quite a bit of trouble running pyflink from the default flink >>> distribution tarballs. I'd expect the python examples to work as long as >>> python is installed, and we've got the distribution. Some python >>> dependencies are not included in the flink distribution tarballs: >>> cloudpickle, py4j and pyflink are in opt/python. Others are not, e.g. >>> protobuf. >>> >>> Now that I'm looking, I see that the pyflink installation instructions >>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/installation/> >>> are >>> to install via pip. >>> >>> I'm doing this in Docker for use with the flink-kubernetes-operator. In >>> the Using Flink Python on Docker >>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker> >>> instructions, >>> there is a pip3 install apache-flink step. I find this strange, since I'd >>> expect the 'FROM flink:1.15.2' part to be sufficient. >>> >>> By pip installing apache-flink, this docker image will have the flink >>> distro installed at /opt/flink and FLINK_HOME set to /opt/flink >>> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>. >>> BUT ALSO flink lib jars will be installed at e.g. >>> /usr/local/lib/python3.7/dist-packages/pyflink/lib! >>> So, by following those instructions, flink is effectively installed >>> twice into the docker image. >>> >>> Am I correct or am I missing something? >>> >>> Is using pyflink from the flink distribution tarball (without pip) not a >>> supported way to use pyflink? >>> >>> Thanks! >>> -Andrew Otto >>> Wikimedia Foundation >>> >>>