Hi, If the join keys are skewed is there are specific optimized join available in Spark for such usecases ?
I saw in both scalding and Hive similar feature is supported and I am testing skewjoinWithSmaller on one of the skewed dataset... http://twitter.github.io/scalding/com/twitter/scalding/JoinAlgorithms.html: skewjoinWithSmaller https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization: I am not sure if it is a proposal or it has been added I guess using hashpartition we can generate new join keys for the cases where the join key is skewed...I was wondering if there is something available in the API already. Thanks. Deb