Spark join for skewed dataset

Debasish Das Sat, 15 Mar 2014 20:59:28 -0700

Hi,

If the join keys are skewed is there are specific optimized join available
in Spark for such usecases ?


I saw in both scalding and Hive similar feature is supported and I am
testing skewjoinWithSmaller on one of the skewed dataset...


http://twitter.github.io/scalding/com/twitter/scalding/JoinAlgorithms.html:
skewjoinWithSmaller



https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization:
I am not sure if it is a proposal or it has been added


I guess using hashpartition we can generate new join keys for the cases
where the join key is skewed...I was wondering if there is something
available in the API already.


Thanks.

Deb

Spark join for skewed dataset

Reply via email to