Example to handle data skewness

2018-01-29 Thread Sejal Chauhan
Hi Dev community, A large data skew is leading to memory problem in my cluster. I was wondering if anyone has tackled this with their own hash function and it worked for the same cluster configuration. Thanks, Sejal

BroadcastHashJoinExec cleanup

2018-01-29 Thread Marco Gaido
Hello, looking at BroadcastHashJoinExec, it seems to me that it never destroys the broadcasted variables. And I think this can cause problems like SPARK-22575. Anyway, when I tried to add a "cleanup" to destroy the variable, I saw some test failure because it was trying to access a the destroyed

Nondeterministic Catalyst expressions -- trait and property?!

2018-01-29 Thread Jacek Laskowski
Hi, Why does Spark SQL need Nondeterministic trait [1] and property? That must be confusing for others not only me, right? [1] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L299 [2] https://github.com/apache/spa