Hi Regina, judging from the exception you posted, this is not about storing the file in HDFS, but a step before that where the BlobServer first puts the incoming file into its local file system in the directory given by the `blob.storage.directory` configuration property. If this property is not set or empty, it will fall back to `java.io.tmpdir`. The BlobServer creates a subdirectory `blobStore-<UUID>` and put incoming files into `<storage-dir>/blobStore-<UUID>/incoming` with file names `temp-12345678` (using an atomic file counter). It seems that there is no space left in the filesystem of this directory.
If you set the log level to INFO, you should see a message like "Created BLOB server storage directory ..." with the path. Can you double check whether there is really no space left there? Nico On 12/12/17 08:02, Chan, Regina wrote: > And if it helps, I’m running on flink 1.2.1. I saw this ticket: > https://issues.apache.org/jira/browse/FLINK-5828 It only started > happening when I was running all 50 flows at the same time. However, it > looks like it’s not an issue with creating the cache directory but with > running out of space there? But what’s in there is also tiny. > > > > bash-4.1$ hdfs dfs -du -h > hdfs://d191291/user/delp/.flink/application_1510733430616_2098853 > > 1.1 K > hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/5c71e4b6-2567-4d34-98dc-73b29c502736-taskmanager-conf.yaml > > 1.4 K > hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/flink-conf.yaml > > 93.5 M > hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/flink-dist_2.10-1.2.1.jar > > 264.8 M > hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/lib > > 1.9 K > hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/log4j.properties > > > > > > *From:*Chan, Regina [Tech] > *Sent:* Tuesday, December 12, 2017 1:56 AM > *To:* 'user@flink.apache.org' > *Subject:* ProgramInvocationException: Could not upload the jar files to > the job manager / No space left on device > > > > Hi, > > > > I’m currently submitting 50 separate jobs to a 50TM, 1 slot set up. Each > job has 1 parallelism. There’s plenty of space left in my cluster and on > that node. It’s not clear to me what’s happening. Any pointers? > > > > On the client side, when I try to execute, I see the following: > > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Could not upload the jar files to the job manager. > > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > > at > org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101) > > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) > > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387) > > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926) > > at > com.gs.ep.da.lake.refinerlib.flink.FlowData.execute(FlowData.java:143) > > at > com.gs.ep.da.lake.refinerlib.flink.FlowData.flowPartialIngestionHalf(FlowData.java:107) > > at > com.gs.ep.da.lake.refinerlib.flink.FlowData.call(FlowData.java:72) > > at > com.gs.ep.da.lake.refinerlib.flink.FlowData.call(FlowData.java:39) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: org.apache.flink.runtime.client.JobSubmissionException: Could > not upload the jar files to the job manager. > > at > org.apache.flink.runtime.client.JobSubmissionClientActor$1.call(JobSubmissionClientActor.java:150) > > at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:95) > > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > Caused by: java.io.IOException: Could not retrieve the JobManager's blob > port. > > at > org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:745) > > at > org.apache.flink.runtime.jobgraph.JobGraph.uploadUserJars(JobGraph.java:565) > > at > org.apache.flink.runtime.client.JobSubmissionClientActor$1.call(JobSubmissionClientActor.java:148) > > ... 9 more > > Caused by: java.io.IOException: PUT operation failed: Connection reset > > at > org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:512) > > at org.apache.flink.runtime.blob.BlobClient.put(BlobClient.java:374) > > at > org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:771) > > at > org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:740) > > ... 11 more > > Caused by: java.net.SocketException: Connection reset > > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) > > at java.net.SocketOutputStream.write(SocketOutputStream.java:159) > > at > org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:499) > > ... 14 more > > > > > > On the job manager logs I see this: > > > > 2017-12-12 01:42:47,608 ERROR > org.apache.flink.runtime.blob.BlobServerConnection - PUT > operation failed > > java.io.IOException: No space left on device > > at java.io.FileOutputStream.writeBytes(Native Method) > > at java.io.FileOutputStream.write(FileOutputStream.java:345) > > at > org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314) > > at > org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113) > > 2017-12-12 01:42:47,608 ERROR > org.apache.flink.runtime.blob.BlobServerConnection - PUT > operation failed > > java.io.IOException: No space left on device > > at java.io.FileOutputStream.writeBytes(Native Method) > > at java.io.FileOutputStream.write(FileOutputStream.java:345) > > at > org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314) > > at > org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113) > > 2017-12-12 01:42:47,608 ERROR > org.apache.flink.runtime.blob.BlobServerConnection - PUT > operation failed > > java.io.IOException: No space left on device > > at java.io.FileOutputStream.writeBytes(Native Method) > > at java.io.FileOutputStream.write(FileOutputStream.java:345) > > at > org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314) > > at > org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113) > > 2017-12-12 01:42:47,608 ERROR > org.apache.flink.runtime.blob.BlobServerConnection - PUT > operation failed > > java.io.IOException: No space left on device > > > > > > > > > > *Regina Chan* > > *Goldman Sachs**–*Enterprise Platforms, Data Architecture > > *30 Hudson Street, 37th floor | Jersey City, NY 07302*( (212) 902-5697** > > >
signature.asc
Description: OpenPGP digital signature