Hi,
The input in HDFS is a directory containing 890 files (biggest one 23M,
smallest one 145K, average size 10M). It seems that I reach some limit
of HDFS because all the files after a certain number (594) could not be
loaded. For example, the full run of my code pops the following error:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Negative
length is not supported. File: /user/myname/input/data/data0594.xml.zip
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:782)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:751)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getBlockLocations(NameNode.java:272)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.getBlockLocations(Unknown Source)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.getBlockLocations(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:279)
at
org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:300)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:163)
at ZipSplit.getLocations(ZipSplit.java:114)
at
org.apache.hadoop.mapred.JobClient.writeSplitsFile(JobClient.java:966)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:823)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
at ContextCounterUltra.run(ContextCounterUltra.java:279)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at ContextCounterUltra.main(ContextCounterUltra.java:292)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
My guess is based on the following reasons:
1. I tried all the files after 594, always error
2. I deleted the file 594, then the error pops up on the 595 file
3. I tried small subset of files (500 files), it works fine
4. To make sure that the files are not damaged, I reuploaded the files
and stored them in another directory, then put them in HDFS again, the
same error pops
Have you ever seen the similar problem before and known whats wrong with
it? I think there is something wrong with the disk / system settings.
--
Shi