Yes, you can usually use a broadcast join to avoid skew problems. On Wed, May 2, 2018 at 8:57 PM, Pralabh Kumar <pralabhku...@gmail.com> wrote:
> I am performing join operation , if I convert reduce side join to map side > (no shuffle will happen) and I assume in that case this error shouldn't > come. Let me know if this understanding is correct > > On Tue, May 1, 2018 at 9:37 PM, Ryan Blue <rb...@netflix.com> wrote: > >> This is usually caused by skew. Sometimes you can work around it by in >> creasing the number of partitions like you tried, but when that doesn’t >> work you need to change the partitioning that you’re using. >> >> If you’re aggregating, try adding an intermediate aggregation. For >> example, if your query is select sum(x), a from t group by a, then try select >> sum(partial), a from (select sum(x) as partial, a, b from t group by a, b) >> group by a. >> >> rb >> >> >> On Tue, May 1, 2018 at 4:21 AM, Pralabh Kumar <pralabhku...@gmail.com> >> wrote: >> >>> Hi >>> >>> I am getting the above error in Spark SQL . I have increase (using 5000 >>> ) number of partitions but still getting the same error . >>> >>> My data most probably is skew. >>> >>> >>> >>> org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829 >>> at >>> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:419) >>> at >>> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:349) >>> >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > > -- Ryan Blue Software Engineer Netflix