I've come across similar issues when trying to set up Flink on Amazon EC2 instances. Presumably there is something wrong with my setup? Here is the flink-conf.yaml I am using: https://gist.githubusercontent.com/GEOFBOT/3ffc9b21214174ae750cc3fdb2625b71/raw/flink-conf.yaml
Thanks, Geoffrey On Wed, Jul 13, 2016 at 1:15 PM Geoffrey Mon <geof...@gmail.com> wrote: > Hello, > > Here is the TaskManager log on pastebin: > http://pastebin.com/XAJ56gn4 > > I will look into whether the files were created. > > By the way, the cluster is made with virtual machines running on BlueData > EPIC. I don't know if that might be related to the problem. > > Thanks, > Geoffrey > > > On Wed, Jul 13, 2016 at 6:12 AM Chesnay Schepler <ches...@apache.org> > wrote: > >> Hello Geoffrey, >> >> How often does this occur? >> >> Flink distributes the user-code and the python library using the >> Distributed Cache. >> >> Either the file is deleted right after being created for some reason, or >> the DC returns a file name before the file was created (which shouldn't >> happen, it should block it is available). >> >> If you are up to debugging this i would suggest looking into FileCache >> class and verifying whether the file in question is in fact created. >> >> The logs of the TaskManager of which the exception occurs could be of >> interest too; could you send them to me? >> >> Regards, >> Chesnay >> >> >> On 13.07.2016 04:11, Geoffrey Mon wrote: >> >> Hello all, >> >> I've set up Flink on a very small cluster of one master node and five >> worker nodes, following the instructions in the documentation ( >> https://ci.apache.org/projects/flink/flink-docs-master/setup/cluster_setup.html). >> I can run the included examples like WordCount and PageRank across the >> entire cluster, but when I try to run simple Python examples, I sometimes >> get a strange error on the first PythonMapPartition about the temporary >> folders that contain the streams of data between Python and Java. >> >> If I run jobs on only the taskmanager on the master node, Python examples >> run fine. However, if the jobs use the worker nodes, then I get the >> following error: >> >> org.apache.flink.client.program.ProgramInvocationException: The program >> execution failed: Job execution failed. >> at >> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:378) >> <snip> >> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job >> execution failed. >> at >> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:806) >> <snip> >> Caused by: java.lang.Exception: The user defined 'open()' method caused >> an exception: External process for task MapPartition (PythonMap) terminated >> prematurely. >> python: can't open file >> '/home/bluedata/flink/tmp/flink-dist-cache-d1958549-3d58-4554-b286-cb4b1cdb9060/52feefa8892e61ba9a187f7684c84ada/flink/plan.py': >> [Errno 2] No such file or directory >> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:481) >> <snip> >> Caused by: java.lang.RuntimeException: External process for task >> MapPartition (PythonMap) terminated prematurely. >> python: can't open file >> '/home/bluedata/flink/tmp/flink-dist-cache-d1958549-3d58-4554-b286-cb4b1cdb9060/52feefa8892e61ba9a187f7684c84ada/flink/plan.py': >> [Errno 2] No such file or directory >> at >> org.apache.flink.python.api.streaming.data.PythonStreamer.startPython(PythonStreamer.java:144) >> at >> org.apache.flink.python.api.streaming.data.PythonStreamer.open(PythonStreamer.java:92) >> at >> org.apache.flink.python.api.functions.PythonMapPartition.open(PythonMapPartition.java:48) >> at >> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38) >> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:477) >> ... 5 more >> >> I'm suspecting this issue has something to do with the data sending >> between the master and the workers, but I haven't been able to find any >> solutions. Presumably the temporary files weren't received properly and >> thus were not created properly? >> >> Thanks in advance. >> >> Cheers, >> Geoffrey >> >> >>