I might take a look at that pr if we get around to doing some perf testing
of Spark on various resource managers.
2015년 3월 2일 (월) 오후 12:22, Shivaram Venkataraman shiva...@eecs.berkeley.edu님이
작성:
FWIW there is a PR open to add support for Hadoop 2.4 to spark-ec2 scripts
at
FWIW there is a PR open to add support for Hadoop 2.4 to spark-ec2 scripts
at https://github.com/mesos/spark-ec2/pull/77 -- But it hasnt' received
much review or testing to be merged.
Thanks
Shivaram
On Sun, Mar 1, 2015 at 11:49 PM, Sean Owen so...@cloudera.com wrote:
I agree with that. My
Yeah calling it Hadoop 2 was a very bad naming choice (of mine!), this
was back when CDH4 was the only real distribution available with some
of the newer Hadoop API's and packaging.
I think to not surprise people using this, it's best to keep v1 as the
default. Overall, we try not to change
https://github.com/apache/spark/blob/fd8d283eeb98e310b1e85ef8c3a8af9e547ab5e0/ec2/spark_ec2.py#L162-L164
Is there any reason we shouldn't update the default Hadoop major version in
spark-ec2 to 2?
Nick
One reason I wouldn't change the default is that the Hadoop 2 launched by
spark-ec2 is not a full Hadoop 2 distribution -- Its more of a hybrid
Hadoop version built using CDH4 (it uses HDFS 2, but not YARN AFAIK).
Also our default Hadoop version in the Spark build is still 1.0.4 [1], so
it makes
I agree with that. My anecdotal impression is that Hadoop 1.x usage
out there is maybe a couple percent, and so we should shift towards
2.x at least as defaults.
On Sun, Mar 1, 2015 at 10:59 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote: