date:20200123

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

2020-01-23 Thread Jim Kleckner

I understand that "non-dev" persons could become confused and that some sort of signposting/warning makes sense. Certainly I consider my personal registry on gitlab.com as ephemeral and not intended to publish. We have our own private instance of gitlab where I put artifacts that are derived and

Re: Enabling push-based shuffle in Spark

2020-01-23 Thread mshen

Hi Wenchen, Glad to know that you like this idea. We also looked into making this pluggable in our early design phase. While the ShuffleManager API for pluggable shuffle systems does provide quite some room for customized behaviors for Spark shuffle, we feel that it is still not enough for this

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

2020-01-23 Thread Sean Owen

Yeah the color on this is that 'snapshot' or 'nightly' builds are not quite _discouraged_ by the ASF, but need to be something only devs are likely to find and clearly signposted, because they aren't official blessed releases. It gets into a gray area if the project is 'officially' hosting a way

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

2020-01-23 Thread Dongjoon Hyun

Hi, Jim. Thank you for the proposal. I understand the request. However, the following key benefit sounds like unofficial snapshot binary releases. > For example, this was used to build a version of spark that included SPARK-28938 which has yet to be released and was necessary for spark-operator

[DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

2020-01-23 Thread Jim Kleckner

This story [1] proposes adding a .gitlab-ci.yml file to make it easy to create artifacts and images for spark. Using this mechanism, people can submit any subsequent version of spark for building and image hosting with gitlab.com. There is a companion WIP branch [2] with a candidate and example

Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-23 Thread Wenchen Fan

I don't think we want to add a lot of flexibility to the PARTITION BY expressions. It's usually just columns or nested fields, or some common functions like year, month, etc. If you look at the parser, we create DS V2 Expression directly. The partition-specific expressions are for

Re: Enabling push-based shuffle in Spark

2020-01-23 Thread Wenchen Fan

The name "push-based shuffle" is a little misleading. This seems like a better shuffle service that co-locates shuffle blocks of one reducer at the map phase. I think this is a good idea. Is it possible to make it completely external via the shuffle plugin API? This looks like a good use case of

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

Re: Enabling push-based shuffle in Spark

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

[DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

Re: Enabling push-based shuffle in Spark

7 matches

Site Navigation

Mail list logo

Footer information