Left outer joining big data set with small lookups

VIJAYAKUMAR JAWAHARLAL Fri, 14 Aug 2015 06:40:32 -0700

Hi

I am facing huge performance problem when I am trying to left outer join very 
big data set (~140GB) with bunch of small lookups [Start schema type]. I am 
using data frame  in spark sql. It looks like data is shuffled and skewed when 
that join happens. Is there any way to improve performance of such type of join 
in spark?


How can I hint optimizer to go with replicated join etc., to avoid shuffle? 
Would it help to create broadcast variables on small lookups?  If I create 
broadcast variables, how can I convert them into data frame and use them in 
sparksql type of join?

Thanks
Vijay
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Left outer joining big data set with small lookups

Reply via email to