[ https://issues.apache.org/jira/browse/HADOOP-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HADOOP-4637. -------------------------------------- Resolution: Incomplete Closing this as a stale issue. > Unhandled failures starting jobs with S3 as backing store > --------------------------------------------------------- > > Key: HADOOP-4637 > URL: https://issues.apache.org/jira/browse/HADOOP-4637 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 0.18.1 > Reporter: Robert > > I run Hadoop 0.18.1 on Amazon EC2, with S3 as the backing store. > When starting jobs, I sometimes get the following failure, which causes the > job to be abandoned: > org.apache.hadoop.ipc.RemoteException: java.io.IOException: > java.lang.NullPointerException > at > org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveBlock(Jets3tFileSystemStore.java:222) > at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy4.retrieveBlock(Unknown Source) > at > org.apache.hadoop.fs.s3.S3InputStream.blockSeekTo(S3InputStream.java:160) > at org.apache.hadoop.fs.s3.S3InputStream.read(S3InputStream.java:119) > at java.io.DataInputStream.read(DataInputStream.java:83) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:214) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:150) > at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1212) > at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1193) > at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:177) > at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) > at org.apache.hadoop.ipc.Client.call(Client.java:715) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > at org.apache.hadoop.mapred.$Proxy5.submitJob(Unknown Source) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026) > The stack trace suggests that copying the job file fails, because the HDFS S3 > filesystem can't find all of the expected block objects when it needs them. > Since S3 is an "eventually consistent" kind of a filesystem, and does not > always provide an up-to-date view of the stored data, this execution path > probably should be strengthened - at least to retry these failed operations, > or wait for the expected block file if it hasn't shown up yet. -- This message was sent by Atlassian JIRA (v6.2#6252)