I've updated to Spark 1.2.0 and the EC2 and the persistent-hdfs behaviour appears to have changed.
My launch script is spark-1.2.0-bin-hadoop2.4/ec2/spark-ec2 --instance-type=m3.xlarge -s 5 --ebs-vol-size=1000 launch myproject When I ssh into master I get: $ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 2.9G 5.0G 37% / tmpfs 7.3G 0 7.3G 0% /dev/shm /dev/xvdb 37G 1.3G 34G 4% /mnt /dev/xvdc 37G 177M 35G 1% /mnt2 /dev/xvds 1000G 33M 1000G 1% /vol0 that /vol0 is the place I want (and assume) persistent-hdfs to go. But when I look at the size I get: $ persistent-hdfs/bin/start-all.sh $ persistent-hdfs/bin/hadoop dfsadmin -report Warning: $HADOOP_HOME is deprecated. Configured Capacity: 42275430400 (39.37 GB) Present Capacity: 26448744448 (24.63 GB) DFS Remaining: 26448601088 (24.63 GB) DFS Used: 143360 (140 KB) DFS Used%: 0% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 5 (5 total, 0 dead) Name: 10.46.11.156:60010 Decommission Status : Normal Configured Capacity: 8455086080 (7.87 GB) DFS Used: 28672 (28 KB) Non DFS Used: 3165372416 (2.95 GB) DFS Remaining: 5289684992(4.93 GB) DFS Used%: 0% DFS Remaining%: 62.56% Last contact: Fri Feb 13 17:41:46 UTC 2015 Name: 10.41.51.155:60010 Decommission Status : Normal Configured Capacity: 8455086080 (7.87 GB) DFS Used: 28672 (28 KB) Non DFS Used: 3165364224 (2.95 GB) DFS Remaining: 5289693184(4.93 GB) DFS Used%: 0% DFS Remaining%: 62.56% Last contact: Fri Feb 13 17:41:46 UTC 2015 Name: 10.38.30.254:60010 Decommission Status : Normal Configured Capacity: 8455086080 (7.87 GB) DFS Used: 28672 (28 KB) Non DFS Used: 3165249536 (2.95 GB) DFS Remaining: 5289807872(4.93 GB) DFS Used%: 0% DFS Remaining%: 62.56% Last contact: Fri Feb 13 17:41:46 UTC 2015 Name: 10.204.134.84:60010 Decommission Status : Normal Configured Capacity: 8455086080 (7.87 GB) DFS Used: 28672 (28 KB) Non DFS Used: 3165343744 (2.95 GB) DFS Remaining: 5289713664(4.93 GB) DFS Used%: 0% DFS Remaining%: 62.56% Last contact: Fri Feb 13 17:41:46 UTC 2015 Name: 10.33.15.134:60010 Decommission Status : Normal Configured Capacity: 8455086080 (7.87 GB) DFS Used: 28672 (28 KB) Non DFS Used: 3165356032 (2.95 GB) DFS Remaining: 5289701376(4.93 GB) DFS Used%: 0% DFS Remaining%: 62.56% Last contact: Fri Feb 13 17:41:46 UTC 2015 That's tiny. My suspicions are aroused when I see: $ ls /vol persistent-hdfs /vol is on the small /dev/xvda1 not the large EBS /dev/xvds I thought I'd be able to edit persistent-hdfs/conf/core-site.xml to change the volume: <property> <name>hadoop.tmp.dir</name> <value>/vol0/persistent-hdfs</value> <!-- was /vol/persistent-hdfs --> </property> And then persistent-hdfs/bin/stop-all.sh && persistent-hdfs/bin/start-all.sh but when I do that, the persistent HDFS won't start for whatever reason. I can't run $ persistent-hdfs/bin/hadoop dfsadmin -report 15/02/13 18:50:25 INFO ipc.Client: Retrying connect to server: ec2-54-70-252-81.us-west-2.compute.amazonaws.com/10.23.161.84:9010. Already tried 0 time(s). 15/02/13 18:50:26 INFO ipc.Client: Retrying connect to server: ec2-54-70-252-81.us-west-2.compute.amazonaws.com/10.23.161.84:9010. Already tried 1 time(s). 15/02/13 18:50:27 INFO ipc.Client: Retrying connect to server: ec2-54-70-252-81.us-west-2.compute.amazonaws.com/10.23.161.84:9010. Already tried 2 time(s). So, it looks like I can't use the EBS for persistent-hdfs. I was doing it before, so something must have changed in the last couple of weeks (last time I was using 1.1.0). Is this a bug? Has the behaviour of AWS changed? Am I doing something stupid? How do I fix it? Thanks in advance! Joe