I think you are running into a combo of

https://issues.apache.org/jira/browse/SPARK-5928
and
https://issues.apache.org/jira/browse/SPARK-5945

The standard solution is to just increase the number of partitions you are
creating. textFile(), reduceByKey(), and sortByKey() all take an optional
second argument, where you can specify the number of partitions you use.
It looks its using spark.default.parallelism right now, which will be the
number of cores in your cluster usually (not sure what that is in your
case).  The exception you gave shows your about 6x over the limit in at
least this one case, so I'd start by with at least 10x the number of
partitions you have now, and increase until it works (or you run into some
other problem from too many partitions ...)

I'd also strongly suggest doing the filter before you do the sortByKey --
no reason to force all that data if you're going to through a lot of it
away.  Its not completely clear where you are hitting the error now -- that
alone. might even solve your problem.

hope this helps,
Imran


On Thu, Mar 19, 2015 at 5:28 PM, roni <roni.epi...@gmail.com> wrote:

> I get 2 types of error -
> -org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
> location for shuffle 0 and
> FetchFailedException: Adjusted frame length exceeds 2147483647:
> 12716268407 - discarded
>
> Spar keeps re-trying to submit the code and keeps getting this error.
>
> My file on which I am finding  the sliding window strings is 500 MB  and I
> am doing it with length = 150.
> It woks fine till length is 100.
>
> This is my code -
>  val hgfasta = sc.textFile(args(0)) // read the fasta file
>     val kCount = hgfasta.flatMap(r => { r.sliding(args(2).toInt) })
>     val kmerCount = kCount.map(x => (x, 1)).reduceByKey(_ + _).map { case
> (x, y) => (y, x) }.sortByKey(false).map { case (i, j) => (j, i) }
>
>       val filtered = kmerCount.filter(kv => kv._2 < 5)
>       filtered.map(kv => kv._1 + ", " +
> kv._2.toLong).saveAsTextFile(args(1))
>
>   }
> It gets stuck and flat map and save as Text file  Throws
> -org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
> location for shuffle 0 and
>
> org.apache.spark.shuffle.FetchFailedException: Adjusted frame length exceeds 
> 2147483647: 12716268407 - discarded
>       at 
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
>       at 
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
>       at 
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
>
>
>

Reply via email to