date:20191110

Re: Why not implement CodegenSupport in class ShuffledHashJoinExec?

2019-11-10 Thread Wenchen Fan

Yea codegen can be a good improvement, PRs are welcome! On Sun, Nov 10, 2019 at 6:28 PM Wang, Gang wrote: > That’s right. By default, Spark prefers sort merge join. > > While, in our product environment, there are many huge bucket tables. We > can leverage the bucketing to avoid shuffle when joi

Re: dev/merge_spark_pr.py broken on python 2

2019-11-10 Thread Hyukjin Kwon

Yeah.. let's stick to Python 3 in general .. I plan to drop Python 2 completely right after Spark 3.0 release. The exception you face .. seems like run_cmd now produces unicode instead of bytes in Python 2 with the merge script. Later, seems this unicode is attempted to be casted to bytes implicit

Re: Build customized resource manager

2019-11-10 Thread Klaus Ma

hm that'll be better to me if we can build customized resource manager out of core; otherwise, we have to go through the long discussion in the community :) But if we support that, why still mesos/yarn/k8s resource manager there in the tree? On Fri, Nov 8, 2019 at 10:18 PM Tom Graves wrote:

Re: Why not implement CodegenSupport in class ShuffledHashJoinExec?

2019-11-10 Thread Wang, Gang

That’s right. By default, Spark prefers sort merge join. While, in our product environment, there are many huge bucket tables. We can leverage the bucketing to avoid shuffle when join with other small tables (the small tables are not small enough to leverage broad cast join). Problem is that, al

Re: Why not implement CodegenSupport in class ShuffledHashJoinExec?

2019-11-10 Thread Wenchen Fan

By default sort merge join is preferred over shuffle hash join, that's why we haven't spend resources to implement codegen for it. On Sun, Nov 10, 2019 at 3:15 PM Wang, Gang wrote: > There are some cases, shuffle hash join performs even better than sort > merge join. > > While, I noticed that Sh

Re: Why not implement CodegenSupport in class ShuffledHashJoinExec?

Re: dev/merge_spark_pr.py broken on python 2

Re: Build customized resource manager

Re: Why not implement CodegenSupport in class ShuffledHashJoinExec?

Re: Why not implement CodegenSupport in class ShuffledHashJoinExec?

5 matches

Site Navigation

Mail list logo

Footer information