Hi Andrew,

>> By pip installing apache-flink, this docker image will have the flink
distro installed at /opt/flink and FLINK_HOME set to /opt/flink
<https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>.
BUT ALSO flink lib jars will be installed at e.g.
/usr/local/lib/python3.7/dist-packages/pyflink/lib!
So, by following those instructions, flink is effectively installed twice
into the docker image.

Yes, your understanding is correct. The base image `flink:1.15.2` doesn't
include PyFlink and so you need to build a custom image if you want to use
PyFlink. Regarding to the jar packages which are installed twice, you could
remove the JAR packages located under
/usr/local/lib/python3.7/dist-packages/pyflink/lib manually after `pip
install apache-flink`. It will use the JAR packages located under
$FLINK_HOME/lib.

>> Is using pyflink from the flink distribution tarball (without pip) not a
supported way to use pyflink?
You are right.

Regards,
Dian


On Thu, Jan 26, 2023 at 11:12 PM Andrew Otto <o...@wikimedia.org> wrote:

> Ah, oops and my original email had a typo:
> > Some python dependencies are not included in the flink distribution
> tarballs: cloudpickle, py4j and pyflink are in opt/python.
>
> Should read:
> > Some python dependencies ARE included in the flink distribution
> tarballs: cloudpickle, py4j and pyflink are in opt/python.
>
> On Thu, Jan 26, 2023 at 10:10 AM Andrew Otto <o...@wikimedia.org> wrote:
>
>> Let me ask a related question:
>>
>> We are building our own base Flink docker image.  We will be deploying
>> both JVM and python apps via flink-kubernetes-operator.
>>
>> Is there any reason not to install Flink in this image via `pip install
>> apache-flink` and use it for JVM apps?
>>
>> -Andrew Otto
>>  Wikimedia Foundation
>>
>>
>>
>> On Tue, Jan 24, 2023 at 4:26 PM Andrew Otto <o...@wikimedia.org> wrote:
>>
>>> Hello,
>>>
>>> I'm having quite a bit of trouble running pyflink from the default flink
>>> distribution tarballs.  I'd expect the python examples to work as long as
>>> python is installed, and we've got the distribution.  Some python
>>> dependencies are not included in the flink distribution tarballs:
>>> cloudpickle, py4j and pyflink are in opt/python.  Others are not, e.g.
>>> protobuf.
>>>
>>> Now that I'm looking, I see that the pyflink installation instructions
>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/installation/>
>>>  are
>>> to install via pip.
>>>
>>> I'm doing this in Docker for use with the flink-kubernetes-operator.  In
>>> the Using Flink Python on Docker
>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker>
>>>  instructions,
>>> there is a pip3 install apache-flink step.  I find this strange, since I'd
>>> expect the 'FROM flink:1.15.2'  part to be sufficient.
>>>
>>> By pip installing apache-flink, this docker image will have the flink
>>> distro installed at /opt/flink and FLINK_HOME set to /opt/flink
>>> <https://github.com/apache/flink-docker/blob/master/1.16/scala_2.12-java11-ubuntu/Dockerfile>.
>>> BUT ALSO flink lib jars will be installed at e.g.
>>> /usr/local/lib/python3.7/dist-packages/pyflink/lib!
>>> So, by following those instructions, flink is effectively installed
>>> twice into the docker image.
>>>
>>> Am I correct or am I missing something?
>>>
>>> Is using pyflink from the flink distribution tarball (without pip) not a
>>> supported way to use pyflink?
>>>
>>> Thanks!
>>> -Andrew Otto
>>>  Wikimedia Foundation
>>>
>>>

Reply via email to