Re: spark-ec2 default to Hadoop 2

2015-03-01 Thread Patrick Wendell
Yeah calling it Hadoop 2 was a very bad naming choice (of mine!), this was back when CDH4 was the only real distribution available with some of the newer Hadoop API's and packaging. I think to not surprise people using this, it's best to keep v1 as the default. Overall, we try not to change

spark-ec2 default to Hadoop 2

2015-03-01 Thread Nicholas Chammas
https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164 Is there any reason we shouldn't update the default Hadoop major version in spark-ec2 to 2? Nick

Re: spark-ec2 default to Hadoop 2

2015-03-01 Thread Shivaram Venkataraman
One reason I wouldn't change the default is that the Hadoop 2 launched by spark-ec2 is not a full Hadoop 2 distribution -- Its more of a hybrid Hadoop version built using CDH4 (it uses HDFS 2, but not YARN AFAIK). Also our default Hadoop version in the Spark build is still 1.0.4 [1], so it makes

Re: spark-ec2 default to Hadoop 2

2015-03-01 Thread Sean Owen
I agree with that. My anecdotal impression is that Hadoop 1.x usage out there is maybe a couple percent, and so we should shift towards 2.x at least as defaults. On Sun, Mar 1, 2015 at 10:59 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: