Re: spark-ec2 default to Hadoop 2

2015-03-02 Thread Nicholas Chammas
I might take a look at that pr if we get around to doing some perf testing of Spark on various resource managers. 2015년 3월 2일 (월) 오후 12:22, Shivaram Venkataraman shiva...@eecs.berkeley.edu님이 작성: FWIW there is a PR open to add support for Hadoop 2.4 to spark-ec2 scripts at

Re: spark-ec2 default to Hadoop 2

2015-03-02 Thread Shivaram Venkataraman
FWIW there is a PR open to add support for Hadoop 2.4 to spark-ec2 scripts at https://github.com/mesos/spark-ec2/pull/77 -- But it hasnt' received much review or testing to be merged. Thanks Shivaram On Sun, Mar 1, 2015 at 11:49 PM, Sean Owen so...@cloudera.com wrote: I agree with that. My

Re: spark-ec2 default to Hadoop 2

2015-03-01 Thread Patrick Wendell
Yeah calling it Hadoop 2 was a very bad naming choice (of mine!), this was back when CDH4 was the only real distribution available with some of the newer Hadoop API's and packaging. I think to not surprise people using this, it's best to keep v1 as the default. Overall, we try not to change

spark-ec2 default to Hadoop 2

2015-03-01 Thread Nicholas Chammas
https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164 Is there any reason we shouldn't update the default Hadoop major version in spark-ec2 to 2? Nick

Re: spark-ec2 default to Hadoop 2

2015-03-01 Thread Shivaram Venkataraman
One reason I wouldn't change the default is that the Hadoop 2 launched by spark-ec2 is not a full Hadoop 2 distribution -- Its more of a hybrid Hadoop version built using CDH4 (it uses HDFS 2, but not YARN AFAIK). Also our default Hadoop version in the Spark build is still 1.0.4 [1], so it makes

Re: spark-ec2 default to Hadoop 2

2015-03-01 Thread Sean Owen
I agree with that. My anecdotal impression is that Hadoop 1.x usage out there is maybe a couple percent, and so we should shift towards 2.x at least as defaults. On Sun, Mar 1, 2015 at 10:59 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: