I've come across similar issues when trying to set up Flink on Amazon EC2
instances. Presumably there is something wrong with my setup? Here is the
flink-conf.yaml I am using:
https://gist.githubusercontent.com/GEOFBOT/3ffc9b21214174ae750cc3fdb2625b71/raw/flink-conf.yaml

Thanks,
Geoffrey

On Wed, Jul 13, 2016 at 1:15 PM Geoffrey Mon <geof...@gmail.com> wrote:

> Hello,
>
> Here is the TaskManager log on pastebin:
> http://pastebin.com/XAJ56gn4
>
> I will look into whether the files were created.
>
> By the way, the cluster is made with virtual machines running on BlueData
> EPIC. I don't know if that might be related to the problem.
>
> Thanks,
> Geoffrey
>
>
> On Wed, Jul 13, 2016 at 6:12 AM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> Hello Geoffrey,
>>
>> How often does this occur?
>>
>> Flink distributes the user-code and the python library using the
>> Distributed Cache.
>>
>> Either the file is deleted right after being created for some reason, or
>> the DC returns a file name before the file was created (which shouldn't
>> happen, it should block it is available).
>>
>> If you are up to debugging this i would suggest looking into FileCache
>> class and verifying whether the file in question is in fact created.
>>
>> The logs of the TaskManager of which the exception occurs could be of
>> interest too; could you send them to me?
>>
>> Regards,
>> Chesnay
>>
>>
>> On 13.07.2016 04:11, Geoffrey Mon wrote:
>>
>> Hello all,
>>
>> I've set up Flink on a very small cluster of one master node and five
>> worker nodes, following the instructions in the documentation (
>> https://ci.apache.org/projects/flink/flink-docs-master/setup/cluster_setup.html).
>> I can run the included examples like WordCount and PageRank across the
>> entire cluster, but when I try to run simple Python examples, I sometimes
>> get a strange error on the first PythonMapPartition about the temporary
>> folders that contain the streams of data between Python and Java.
>>
>> If I run jobs on only the taskmanager on the master node, Python examples
>> run fine. However, if the jobs use the worker nodes, then I get the
>> following error:
>>
>> org.apache.flink.client.program.ProgramInvocationException: The program
>> execution failed: Job execution failed.
>> at
>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:378)
>> <snip>
>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
>> execution failed.
>> at
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:806)
>> <snip>
>> Caused by: java.lang.Exception: The user defined 'open()' method caused
>> an exception: External process for task MapPartition (PythonMap) terminated
>> prematurely.
>> python: can't open file
>> '/home/bluedata/flink/tmp/flink-dist-cache-d1958549-3d58-4554-b286-cb4b1cdb9060/52feefa8892e61ba9a187f7684c84ada/flink/plan.py':
>> [Errno 2] No such file or directory
>> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:481)
>> <snip>
>> Caused by: java.lang.RuntimeException: External process for task
>> MapPartition (PythonMap) terminated prematurely.
>> python: can't open file
>> '/home/bluedata/flink/tmp/flink-dist-cache-d1958549-3d58-4554-b286-cb4b1cdb9060/52feefa8892e61ba9a187f7684c84ada/flink/plan.py':
>> [Errno 2] No such file or directory
>> at
>> org.apache.flink.python.api.streaming.data.PythonStreamer.startPython(PythonStreamer.java:144)
>> at
>> org.apache.flink.python.api.streaming.data.PythonStreamer.open(PythonStreamer.java:92)
>> at
>> org.apache.flink.python.api.functions.PythonMapPartition.open(PythonMapPartition.java:48)
>> at
>> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38)
>> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:477)
>> ... 5 more
>>
>> I'm suspecting this issue has something to do with the data sending
>> between the master and the workers, but I haven't been able to find any
>> solutions. Presumably the temporary files weren't received properly and
>> thus were not created properly?
>>
>> Thanks in advance.
>>
>> Cheers,
>> Geoffrey
>>
>>
>>

Reply via email to