Hi Regina,
judging from the exception you posted, this is not about storing the
file in HDFS, but a step before that where the BlobServer first puts the
incoming file into its local file system in the directory given by the
`blob.storage.directory` configuration property. If this property is not
set or empty, it will fall back to `java.io.tmpdir`. The BlobServer
creates a subdirectory `blobStore-<UUID>` and put incoming files into
`<storage-dir>/blobStore-<UUID>/incoming` with file names
`temp-12345678` (using an atomic file counter). It seems that there is
no space left in the filesystem of this directory.

If you set the log level to INFO, you should see a message like "Created
BLOB server storage directory ..." with the path. Can you double check
whether there is really no space left there?


Nico

On 12/12/17 08:02, Chan, Regina wrote:
> And if it helps, I’m running on flink 1.2.1. I saw this ticket:
> https://issues.apache.org/jira/browse/FLINK-5828 It only started
> happening when I was running all 50 flows at the same time. However, it
> looks like it’s not an issue with creating the cache directory but with
> running out of space there? But what’s in there is also tiny.
> 
>  
> 
> bash-4.1$ hdfs dfs -du -h
> hdfs://d191291/user/delp/.flink/application_1510733430616_2098853
> 
> 1.1 K   
> hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/5c71e4b6-2567-4d34-98dc-73b29c502736-taskmanager-conf.yaml
> 
> 1.4 K   
> hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/flink-conf.yaml
> 
> 93.5 M  
> hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/flink-dist_2.10-1.2.1.jar
> 
> 264.8 M 
> hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/lib
> 
> 1.9 K   
> hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/log4j.properties
> 
>  
> 
>  
> 
> *From:*Chan, Regina [Tech]
> *Sent:* Tuesday, December 12, 2017 1:56 AM
> *To:* 'user@flink.apache.org'
> *Subject:* ProgramInvocationException: Could not upload the jar files to
> the job manager / No space left on device
> 
>  
> 
> Hi,
> 
>  
> 
> I’m currently submitting 50 separate jobs to a 50TM, 1 slot set up. Each
> job has 1 parallelism. There’s plenty of space left in my cluster and on
> that node. It’s not clear to me what’s happening. Any pointers?
> 
>  
> 
> On the client side, when I try to execute, I see the following:
> 
> org.apache.flink.client.program.ProgramInvocationException: The program
> execution failed: Could not upload the jar files to the job manager.
> 
>         at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
> 
>         at
> org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101)
> 
>         at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
> 
>         at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)
> 
>         at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
> 
>         at
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)
> 
>         at
> com.gs.ep.da.lake.refinerlib.flink.FlowData.execute(FlowData.java:143)
> 
>         at
> com.gs.ep.da.lake.refinerlib.flink.FlowData.flowPartialIngestionHalf(FlowData.java:107)
> 
>         at
> com.gs.ep.da.lake.refinerlib.flink.FlowData.call(FlowData.java:72)
> 
>         at
> com.gs.ep.da.lake.refinerlib.flink.FlowData.call(FlowData.java:39)
> 
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 
>         at java.lang.Thread.run(Thread.java:745)
> 
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Could
> not upload the jar files to the job manager.
> 
>         at
> org.apache.flink.runtime.client.JobSubmissionClientActor$1.call(JobSubmissionClientActor.java:150)
> 
>         at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:95)
> 
>         at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> 
>         at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> 
>         at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> 
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
> 
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 
> Caused by: java.io.IOException: Could not retrieve the JobManager's blob
> port.
> 
>         at
> org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:745)
> 
>         at
> org.apache.flink.runtime.jobgraph.JobGraph.uploadUserJars(JobGraph.java:565)
> 
>         at
> org.apache.flink.runtime.client.JobSubmissionClientActor$1.call(JobSubmissionClientActor.java:148)
> 
>         ... 9 more
> 
> Caused by: java.io.IOException: PUT operation failed: Connection reset
> 
>         at
> org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:512)
> 
>         at org.apache.flink.runtime.blob.BlobClient.put(BlobClient.java:374)
> 
>         at
> org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:771)
> 
>         at
> org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:740)
> 
>         ... 11 more
> 
> Caused by: java.net.SocketException: Connection reset
> 
>         at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
> 
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
> 
>         at
> org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:499)
> 
>         ... 14 more
> 
>  
> 
>  
> 
> On the job manager logs I see this:
> 
>  
> 
> 2017-12-12 01:42:47,608 ERROR
> org.apache.flink.runtime.blob.BlobServerConnection            - PUT
> operation failed
> 
> java.io.IOException: No space left on device
> 
>         at java.io.FileOutputStream.writeBytes(Native Method)
> 
>         at java.io.FileOutputStream.write(FileOutputStream.java:345)
> 
>         at
> org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314)
> 
>         at
> org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113)
> 
> 2017-12-12 01:42:47,608 ERROR
> org.apache.flink.runtime.blob.BlobServerConnection            - PUT
> operation failed
> 
> java.io.IOException: No space left on device
> 
>         at java.io.FileOutputStream.writeBytes(Native Method)
> 
>         at java.io.FileOutputStream.write(FileOutputStream.java:345)
> 
>         at
> org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314)
> 
>         at
> org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113)
> 
> 2017-12-12 01:42:47,608 ERROR
> org.apache.flink.runtime.blob.BlobServerConnection            - PUT
> operation failed
> 
> java.io.IOException: No space left on device
> 
>         at java.io.FileOutputStream.writeBytes(Native Method)
> 
>         at java.io.FileOutputStream.write(FileOutputStream.java:345)
> 
>         at
> org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314)
> 
>         at
> org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113)
> 
> 2017-12-12 01:42:47,608 ERROR
> org.apache.flink.runtime.blob.BlobServerConnection            - PUT
> operation failed
> 
> java.io.IOException: No space left on device
> 
>  
> 
>  
> 
>  
> 
>  
> 
> *Regina Chan*
> 
> *Goldman Sachs**–*Enterprise Platforms, Data Architecture
> 
> *30 Hudson Street, 37th floor | Jersey City, NY 07302*(  (212) 902-5697**
> 
>  
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to