Hi Michael, Thanks for the suggestion. In my query, both table are too large to use broadcast join.
When SPARK-2211 is done, will spark sql automatically choose join algorithms? Is there some way to manually hint the optimizer? 2014-07-19 5:23 GMT+08:00 Michael Armbrust <mich...@databricks.com>: > Unfortunately, this is a query where we just don't have an efficiently > implementation yet. You might try switching the table order. > > Here is the JIRA for doing something more efficient: > https://issues.apache.org/jira/browse/SPARK-2212 > > > On Fri, Jul 18, 2014 at 7:05 AM, Pei-Lun Lee <pl...@appier.com> wrote: > >> Hi, >> >> We have a query with left joining and got this error: >> >> Caused by: org.apache.spark.SparkException: Job aborted due to stage >> failure: Task 1.0:0 failed 4 times, most recent failure: Exception failure >> in TID 5 on host ip-10-33-132-101.us-west-2.compute.internal: >> com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, >> required: 1 >> >> Looks like spark sql tried to do a broadcast join and collecting one of >> the table to master but it is too large. >> >> How do we explicitly control the join behavior like this? >> >> -- >> Pei-Lun Lee >> >> >