Hi Michael,

Thanks for the suggestion. In my query, both table are too large to use
broadcast join.

When SPARK-2211 is done, will spark sql automatically choose join
algorithms?
Is there some way to manually hint the optimizer?


2014-07-19 5:23 GMT+08:00 Michael Armbrust <mich...@databricks.com>:

> Unfortunately, this is a query where we just don't have an efficiently
> implementation yet.  You might try switching the table order.
>
> Here is the JIRA for doing something more efficient:
> https://issues.apache.org/jira/browse/SPARK-2212
>
>
> On Fri, Jul 18, 2014 at 7:05 AM, Pei-Lun Lee <pl...@appier.com> wrote:
>
>> Hi,
>>
>> We have a query with left joining and got this error:
>>
>> Caused by: org.apache.spark.SparkException: Job aborted due to stage
>> failure: Task 1.0:0 failed 4 times, most recent failure: Exception failure
>> in TID 5 on host ip-10-33-132-101.us-west-2.compute.internal:
>> com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0,
>> required: 1
>>
>> Looks like spark sql tried to do a broadcast join and collecting one of
>> the table to master but it is too large.
>>
>> How do we explicitly control the join behavior like this?
>>
>> --
>> Pei-Lun Lee
>>
>>
>

Reply via email to