Looks like the following assertion failed: Preconditions.checkState(storageIDsCount == locs.size());
locs is List<DatanodeInfoProto> Can you enhance the assertion to log more information ? Cheers On Thu, Mar 26, 2015 at 3:06 PM, Dale Johnson <daljohn...@ebay.com> wrote: > There seems to be a special kind of "corrupted according to Spark" state of > file in HDFS. I have isolated a set of files (maybe 1% of all files I need > to work with) which are producing the following stack dump when I try to > sc.textFile() open them. When I try to open directories, most large > directories contain at least one file of this type. Curiously, the > following two lines fail inside of a Spark job, but not inside of a Scoobi > job: > > val conf = new org.apache.hadoop.conf.Configuration > val fs = org.apache.hadoop.fs.FileSystem.get(conf) > > The stack trace follows: > > 15/03/26 14:22:43 INFO yarn.ApplicationMaster: Final app status: FAILED, > exitCode: 15, (reason: User class threw exception: null) > Exception in thread "Driver" java.lang.IllegalStateException > at > > org.spark-project.guava.common.base.Preconditions.checkState(Preconditions.java:133) > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:673) > at > > org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1100) > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1118) > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1251) > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1354) > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1363) > at > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:518) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743) > at > > org.apache.hadoop.hdfs.DistributedFileSystem$15.<init>(DistributedFileSystem.java:738) > at > > org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:727) > at > org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1662) > at org.apache.hadoop.fs.FileSystem$5.<init>(FileSystem.java:1724) > at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1721) > at > > com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1125) > at > > com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1123) > at > > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at > scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > at > > com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$.main(SpellQuery.scala:1123) > at > com.ebay.ss.niffler.miner.speller.SpellQueryLaunch.main(SpellQuery.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427) > 15/03/26 14:22:43 INFO yarn.ApplicationMaster: Invoking sc stop from > shutdown hook > > It appears to have found the three copies of the given HDFS block, but is > performing some sort of validation with them before giving them back to > spark to schedule the job. But there is an assert failing. > > I've tried this with 1.2.0, 1.2.1 and 1.3.0, and I get the exact same > error, > but I've seen the line numbers change on the HDFS libraries, but not the > function names. I've tried recompiling myself with different hadoop > versions, and it's the same. We're running hadoop 2.4.1 on our cluster. > > A google search turns up absolutely nothing on this. > > Any insight at all would be appreciated. > > Dale Johnson > Applied Researcher > eBay.com > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-access-file-in-spark-but-can-in-hadoop-tp22251.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >