Has Spark 1.2.0 changed EC2 persistent-hdfs?

Joe Wass Fri, 13 Feb 2015 11:07:53 -0800

I've updated to Spark 1.2.0 and the EC2 and the persistent-hdfs behaviour
appears to have changed.


My launch script is

spark-1.2.0-bin-hadoop2.4/ec2/spark-ec2 --instance-type=m3.xlarge -s 5
--ebs-vol-size=1000 launch myproject

When I ssh into master I get:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda1            7.9G  2.9G  5.0G  37% /
tmpfs                 7.3G     0  7.3G   0% /dev/shm
/dev/xvdb              37G  1.3G   34G   4% /mnt
/dev/xvdc              37G  177M   35G   1% /mnt2
/dev/xvds            1000G   33M 1000G   1% /vol0

that /vol0 is the place I want (and assume) persistent-hdfs to go. But when
I look at the size I get:

$ persistent-hdfs/bin/start-all.sh
$ persistent-hdfs/bin/hadoop dfsadmin -report
Warning: $HADOOP_HOME is deprecated.

Configured Capacity: 42275430400 (39.37 GB)
Present Capacity: 26448744448 (24.63 GB)
DFS Remaining: 26448601088 (24.63 GB)
DFS Used: 143360 (140 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 5 (5 total, 0 dead)

Name: 10.46.11.156:60010
Decommission Status : Normal
Configured Capacity: 8455086080 (7.87 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 3165372416 (2.95 GB)
DFS Remaining: 5289684992(4.93 GB)
DFS Used%: 0%
DFS Remaining%: 62.56%
Last contact: Fri Feb 13 17:41:46 UTC 2015


Name: 10.41.51.155:60010
Decommission Status : Normal
Configured Capacity: 8455086080 (7.87 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 3165364224 (2.95 GB)
DFS Remaining: 5289693184(4.93 GB)
DFS Used%: 0%
DFS Remaining%: 62.56%
Last contact: Fri Feb 13 17:41:46 UTC 2015


Name: 10.38.30.254:60010
Decommission Status : Normal
Configured Capacity: 8455086080 (7.87 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 3165249536 (2.95 GB)
DFS Remaining: 5289807872(4.93 GB)
DFS Used%: 0%
DFS Remaining%: 62.56%
Last contact: Fri Feb 13 17:41:46 UTC 2015


Name: 10.204.134.84:60010
Decommission Status : Normal
Configured Capacity: 8455086080 (7.87 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 3165343744 (2.95 GB)
DFS Remaining: 5289713664(4.93 GB)
DFS Used%: 0%
DFS Remaining%: 62.56%
Last contact: Fri Feb 13 17:41:46 UTC 2015


Name: 10.33.15.134:60010
Decommission Status : Normal
Configured Capacity: 8455086080 (7.87 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 3165356032 (2.95 GB)
DFS Remaining: 5289701376(4.93 GB)
DFS Used%: 0%
DFS Remaining%: 62.56%
Last contact: Fri Feb 13 17:41:46 UTC 2015


That's tiny. My suspicions are aroused when I see:

$ ls /vol
persistent-hdfs

/vol is on the small /dev/xvda1 not the large EBS /dev/xvds

I thought I'd be able to edit persistent-hdfs/conf/core-site.xml to change
the volume:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/vol0/persistent-hdfs</value>  <!-- was /vol/persistent-hdfs -->
</property>

And then

persistent-hdfs/bin/stop-all.sh && persistent-hdfs/bin/start-all.sh

but when I do that, the persistent HDFS won't start for whatever reason. I
can't run

$ persistent-hdfs/bin/hadoop dfsadmin -report

15/02/13 18:50:25 INFO ipc.Client: Retrying connect to server:
ec2-54-70-252-81.us-west-2.compute.amazonaws.com/10.23.161.84:9010. Already
tried 0 time(s).
15/02/13 18:50:26 INFO ipc.Client: Retrying connect to server:
ec2-54-70-252-81.us-west-2.compute.amazonaws.com/10.23.161.84:9010. Already
tried 1 time(s).
15/02/13 18:50:27 INFO ipc.Client: Retrying connect to server:
ec2-54-70-252-81.us-west-2.compute.amazonaws.com/10.23.161.84:9010. Already
tried 2 time(s).

So, it looks like I can't use the EBS for persistent-hdfs. I was doing it
before, so something must have changed in the last couple of weeks (last
time I was using 1.1.0).

Is this a bug? Has the behaviour of AWS changed? Am I doing something
stupid? How do I fix it?

Thanks in advance!

Joe

Has Spark 1.2.0 changed EC2 persistent-hdfs?

Reply via email to