hadoop-aws versions (was Re: [VOTE] Spark 2.3.1 (RC4))

2018-06-26 Thread Steve Loughran
following up after a ref to this in https://issues.apache.org/jira/browse/HADOOP-15559 the AWS SDK is a very fast moving project, with a release cycle of ~2 weeks, but it's in the state Fred Brooks described, "the number of bugs is constant, they just move around"; bumpin gup an AWS release

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-04 Thread Xiao Li
+1 On Mon, Jun 4, 2018 at 12:44 PM Henry Robinson wrote: > +1 (non-binding) > > On 4 June 2018 at 11:15, Bryan Cutler wrote: > >> +1 >> >> On Mon, Jun 4, 2018 at 10:18 AM, Joseph Bradley >> wrote: >> >>> +1 >>> >>> On Mon, Jun 4, 2018 at 10:16 AM, Mark Hamstra >>> wrote: >>> +1

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-04 Thread Henry Robinson
+1 (non-binding) On 4 June 2018 at 11:15, Bryan Cutler wrote: > +1 > > On Mon, Jun 4, 2018 at 10:18 AM, Joseph Bradley > wrote: > >> +1 >> >> On Mon, Jun 4, 2018 at 10:16 AM, Mark Hamstra >> wrote: >> >>> +1 >>> >>> On Fri, Jun 1, 2018 at 3:29 PM Marcelo Vanzin >>> wrote: >>> Please

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-04 Thread Bryan Cutler
+1 On Mon, Jun 4, 2018 at 10:18 AM, Joseph Bradley wrote: > +1 > > On Mon, Jun 4, 2018 at 10:16 AM, Mark Hamstra > wrote: > >> +1 >> >> On Fri, Jun 1, 2018 at 3:29 PM Marcelo Vanzin >> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 2.3.1. >>> >>>

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-04 Thread Joseph Bradley
+1 On Mon, Jun 4, 2018 at 10:16 AM, Mark Hamstra wrote: > +1 > > On Fri, Jun 1, 2018 at 3:29 PM Marcelo Vanzin wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.3.1. >> >> Given that I expect at least a few people to be busy with Spark Summit >> next >>

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-04 Thread Mark Hamstra
+1 On Fri, Jun 1, 2018 at 3:29 PM Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > Given that I expect at least a few people to be busy with Spark Summit next > week, I'm taking the liberty of setting an extended voting period. The

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-04 Thread John Zhuge
+1 On Sun, Jun 3, 2018 at 6:12 PM, Hyukjin Kwon wrote: > +1 > > 2018년 6월 3일 (일) 오후 9:25, Ricardo Almeida 님이 > 작성: > >> +1 (non-binding) >> >> On 3 June 2018 at 09:23, Dongjoon Hyun wrote: >> >>> +1 >>> >>> Bests, >>> Dongjoon. >>> >>> On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee wrote: >>>

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-03 Thread Hyukjin Kwon
+1 2018년 6월 3일 (일) 오후 9:25, Ricardo Almeida 님이 작성: > +1 (non-binding) > > On 3 June 2018 at 09:23, Dongjoon Hyun wrote: > >> +1 >> >> Bests, >> Dongjoon. >> >> On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee wrote: >> >>> +1 >>> >>> On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas < >>>

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-03 Thread Ricardo Almeida
+1 (non-binding) On 3 June 2018 at 09:23, Dongjoon Hyun wrote: > +1 > > Bests, > Dongjoon. > > On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee wrote: > >> +1 >> >> On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>> I'll give that a try, but I'll still have

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-03 Thread Dongjoon Hyun
+1 Bests, Dongjoon. On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee wrote: > +1 > > On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I'll give that a try, but I'll still have to figure out what to do if >> none of the release builds work with hadoop-aws,

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Denny Lee
+1 On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas wrote: > I'll give that a try, but I'll still have to figure out what to do if none > of the release builds work with hadoop-aws, since Flintrock deploys Spark > release builds to set up a cluster. Building Spark is slow, so we only do > it if

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Nicholas Chammas
I'll give that a try, but I'll still have to figure out what to do if none of the release builds work with hadoop-aws, since Flintrock deploys Spark release builds to set up a cluster. Building Spark is slow, so we only do it if the user specifically requests a Spark version by git hash. (This is

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Wenchen Fan
+1 On Sun, Jun 3, 2018 at 6:54 AM, Marcelo Vanzin wrote: > If you're building your own Spark, definitely try the hadoop-cloud > profile. Then you don't even need to pull anything at runtime, > everything is already packaged with Spark. > > On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas >

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Marcelo Vanzin
If you're building your own Spark, definitely try the hadoop-cloud profile. Then you don't even need to pull anything at runtime, everything is already packaged with Spark. On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas wrote: > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-02 Thread Sean Owen
+1 from me with the same comments as in the last RC. On Fri, Jun 1, 2018 at 5:29 PM Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > Given that I expect at least a few people to be busy with Spark Summit next > week, I'm taking the

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Nicholas Chammas
pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work for me either (even building with -Phadoop-2.7). I guess I’ve been relying on an unsupported pattern and will need to figure something else out going forward in order to use s3a://. ​ On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
I have personally never tried to include hadoop-aws that way. But at the very least, I'd try to use the same version of Hadoop as the Spark build (2.7.3 IIRC). I don't really expect a different version to work, and if it did in the past it definitely was not by design. On Fri, Jun 1, 2018 at 5:50

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Reynold Xin
+1 On Fri, Jun 1, 2018 at 3:29 PM Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > Given that I expect at least a few people to be busy with Spark Summit next > week, I'm taking the liberty of setting an extended voting period. The

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Nicholas Chammas
Building with -Phadoop-2.7 didn’t help, and if I remember correctly, building with -Phadoop-2.8 worked with hadoop-aws in the 2.3.0 release, so it appears something has changed since then. I wasn’t familiar with -Phadoop-cloud, but I can try that. My goal here is simply to confirm that this

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
Using the hadoop-aws package is probably going to be a little more complicated than that. The best bet is to use a custom build of Spark that includes it (use -Phadoop-cloud). Otherwise you're probably looking at some nasty dependency issues, especially if you end up mixing different versions of

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Mark Hamstra
There is no hadoop-2.8 profile. Use hadoop-2.7, which is effectively hadoop-2.7+ On Fri, Jun 1, 2018 at 4:01 PM Nicholas Chammas wrote: > I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4 > using Flintrock . However, trying > to load

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Nicholas Chammas
I was able to successfully launch a Spark cluster on EC2 at 2.3.1 RC4 using Flintrock . However, trying to load the hadoop-aws package gave me some errors. $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4 :: problems summary :: WARNINGS

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
Starting with my own +1 (binding). On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > Given that I expect at least a few people to be busy with Spark Summit next > week, I'm taking the liberty of setting

[VOTE] Spark 2.3.1 (RC4)

2018-06-01 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. Given that I expect at least a few people to be busy with Spark Summit next week, I'm taking the liberty of setting an extended voting period. The vote will be open until Friday, June 8th, at 19:00 UTC (that's 12:00