I think you are running into a combo of https://issues.apache.org/jira/browse/SPARK-5928 and https://issues.apache.org/jira/browse/SPARK-5945
The standard solution is to just increase the number of partitions you are creating. textFile(), reduceByKey(), and sortByKey() all take an optional second argument, where you can specify the number of partitions you use. It looks its using spark.default.parallelism right now, which will be the number of cores in your cluster usually (not sure what that is in your case). The exception you gave shows your about 6x over the limit in at least this one case, so I'd start by with at least 10x the number of partitions you have now, and increase until it works (or you run into some other problem from too many partitions ...) I'd also strongly suggest doing the filter before you do the sortByKey -- no reason to force all that data if you're going to through a lot of it away. Its not completely clear where you are hitting the error now -- that alone. might even solve your problem. hope this helps, Imran On Thu, Mar 19, 2015 at 5:28 PM, roni <roni.epi...@gmail.com> wrote: > I get 2 types of error - > -org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output > location for shuffle 0 and > FetchFailedException: Adjusted frame length exceeds 2147483647: > 12716268407 - discarded > > Spar keeps re-trying to submit the code and keeps getting this error. > > My file on which I am finding the sliding window strings is 500 MB and I > am doing it with length = 150. > It woks fine till length is 100. > > This is my code - > val hgfasta = sc.textFile(args(0)) // read the fasta file > val kCount = hgfasta.flatMap(r => { r.sliding(args(2).toInt) }) > val kmerCount = kCount.map(x => (x, 1)).reduceByKey(_ + _).map { case > (x, y) => (y, x) }.sortByKey(false).map { case (i, j) => (j, i) } > > val filtered = kmerCount.filter(kv => kv._2 < 5) > filtered.map(kv => kv._1 + ", " + > kv._2.toLong).saveAsTextFile(args(1)) > > } > It gets stuck and flat map and save as Text file Throws > -org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output > location for shuffle 0 and > > org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds > 2147483647: 12716268407 - discarded > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67) > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) > > >