Yes, you can usually use a broadcast join to avoid skew problems.

On Wed, May 2, 2018 at 8:57 PM, Pralabh Kumar <pralabhku...@gmail.com>
wrote:

> I am performing join operation , if I convert reduce side join to map side
> (no shuffle will happen)  and I assume in that case this error shouldn't
> come. Let me know if this understanding is correct
>
> On Tue, May 1, 2018 at 9:37 PM, Ryan Blue <rb...@netflix.com> wrote:
>
>> This is usually caused by skew. Sometimes you can work around it by in
>> creasing the number of partitions like you tried, but when that doesn’t
>> work you need to change the partitioning that you’re using.
>>
>> If you’re aggregating, try adding an intermediate aggregation. For
>> example, if your query is select sum(x), a from t group by a, then try select
>> sum(partial), a from (select sum(x) as partial, a, b from t group by a, b)
>> group by a.
>>
>> rb
>> ​
>>
>> On Tue, May 1, 2018 at 4:21 AM, Pralabh Kumar <pralabhku...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> I am getting the above error in Spark SQL . I have increase (using 5000
>>> ) number of partitions but still getting the same error .
>>>
>>> My data most probably is skew.
>>>
>>>
>>>
>>> org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829
>>>     at 
>>> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:419)
>>>     at 
>>> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:349)
>>>
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to