Hello all,

I've set up Flink on a very small cluster of one master node and five
worker nodes, following the instructions in the documentation (
https://ci.apache.org/projects/flink/flink-docs-master/setup/cluster_setup.html).
I can run the included examples like WordCount and PageRank across the
entire cluster, but when I try to run simple Python examples, I sometimes
get a strange error on the first PythonMapPartition about the temporary
folders that contain the streams of data between Python and Java.

If I run jobs on only the taskmanager on the master node, Python examples
run fine. However, if the jobs use the worker nodes, then I get the
following error:

org.apache.flink.client.program.ProgramInvocationException: The program
execution failed: Job execution failed.
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:378)
<snip>
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
execution failed.
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:806)
<snip>
Caused by: java.lang.Exception: The user defined 'open()' method caused an
exception: External process for task MapPartition (PythonMap) terminated
prematurely.
python: can't open file
'/home/bluedata/flink/tmp/flink-dist-cache-d1958549-3d58-4554-b286-cb4b1cdb9060/52feefa8892e61ba9a187f7684c84ada/flink/plan.py':
[Errno 2] No such file or directory
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:481)
<snip>
Caused by: java.lang.RuntimeException: External process for task
MapPartition (PythonMap) terminated prematurely.
python: can't open file
'/home/bluedata/flink/tmp/flink-dist-cache-d1958549-3d58-4554-b286-cb4b1cdb9060/52feefa8892e61ba9a187f7684c84ada/flink/plan.py':
[Errno 2] No such file or directory
at
org.apache.flink.python.api.streaming.data.PythonStreamer.startPython(PythonStreamer.java:144)
at
org.apache.flink.python.api.streaming.data.PythonStreamer.open(PythonStreamer.java:92)
at
org.apache.flink.python.api.functions.PythonMapPartition.open(PythonMapPartition.java:48)
at
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:477)
... 5 more

I'm suspecting this issue has something to do with the data sending between
the master and the workers, but I haven't been able to find any solutions.
Presumably the temporary files weren't received properly and thus were not
created properly?

Thanks in advance.

Cheers,
Geoffrey

Reply via email to