Hi,

The input in HDFS is a directory containing 890 files (biggest one 23M, smallest one 145K, average size 10M). It seems that I reach some limit of HDFS because all the files after a certain number (594) could not be loaded. For example, the full run of my code pops the following error:

org.apache.hadoop.ipc.RemoteException: java.io.IOException: Negative length is not supported. File: /user/myname/input/data/data0594.xml.zip at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:782) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:751) at org.apache.hadoop.hdfs.server.namenode.NameNode.getBlockLocations(NameNode.java:272)
       at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

       at org.apache.hadoop.ipc.Client.call(Client.java:697)
       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
       at $Proxy0.getBlockLocations(Unknown Source)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
       at $Proxy0.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:279) at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:300) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:163)
       at ZipSplit.getLocations(ZipSplit.java:114)
at org.apache.hadoop.mapred.JobClient.writeSplitsFile(JobClient.java:966)
       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:823)
       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
       at ContextCounterUltra.run(ContextCounterUltra.java:279)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at ContextCounterUltra.main(ContextCounterUltra.java:292)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
       at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

My guess is based on the following reasons:
1. I tried all the files after 594, always error
2. I deleted the file 594, then the error pops up on the 595 file
3. I tried small subset of files (500 files), it works fine
4. To make sure that the files are not damaged, I reuploaded the files and stored them in another directory, then put them in HDFS again, the same error pops

Have you ever seen the similar problem before and known whats wrong with it? I think there is something wrong with the disk / system settings.

--
Shi

Reply via email to