Re: SortMergeJoinExec: Utilizing child partitioning when joining

2020-01-02 Thread Brett Marcott
Thanks for the response Andrew. *1. The approach* The approach I mentioned will not introduce any new skew, so it should only be worsen performance if the user was relying on the shuffle to fix skew they had before. The user can address this by either not introducing their own skewed partition in

Re: SortMergeJoinExec: Utilizing child partitioning when joining

2020-01-02 Thread Long, Andrew
“Thoughts on this approach?“ Just to warn you this is a hazardous optimization without cardinality information. Removing columns from the hash exchange reduces entropy potentially resulting in skew. Also keep in mind that if you reduce the number of columns on one side of the join you need

Re: Revisiting Python / pandas UDF (new proposal)

2020-01-02 Thread Li Jin
I am going to review this carefully today. Thanks for the work! Li On Wed, Jan 1, 2020 at 10:34 PM Hyukjin Kwon wrote: > Thanks for comments Maciej - I am addressing them. > adding Li Jin too. > > I plan to proceed this late this week or early next week to make it on > time before code freeze.