Re: Error: No space left on device

2014-07-17 Thread Chris DuBois
Hi Xiangrui, Thanks. I have taken your advice and set all 5 of my slaves to be c3.4xlarge. In this case /mnt and /mnt2 have plenty of space by default. I now do sc.textFile(blah).repartition(N).map(...).cache() with N=80 and spark.executor.memory to be 20gb and --driver-memory 20g. So far things

Re: Error: No space left on device

2014-07-17 Thread Bill Jay
Hi, I also have some issues with repartition. In my program, I consume data from Kafka. After I consume data, I use repartition(N). However, although I set N to be 120, there are around 18 executors allocated for my reduce stage. I am not sure how the repartition command works ton ensure the

Error: No space left on device

2014-07-16 Thread Chris DuBois
Hi all, I am encountering the following error: INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No space left on device [duplicate 4] For each slave, df -h looks roughtly like this, which makes the above error surprising. FilesystemSize Used Avail Use% Mounted

Re: Error: No space left on device

2014-07-16 Thread Xiangrui Meng
Check the number of inodes (df -i). The assembly build may create many small files. -Xiangrui On Tue, Jul 15, 2014 at 11:35 PM, Chris DuBois chris.dub...@gmail.com wrote: Hi all, I am encountering the following error: INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No

Re: Error: No space left on device

2014-07-16 Thread Chris DuBois
df -i # on a slave FilesystemInodes IUsed IFree IUse% Mounted on /dev/xvda1524288 277701 246587 53% / tmpfs1917974 1 19179731% /dev/shm On Tue, Jul 15, 2014 at 11:39 PM, Xiangrui Meng men...@gmail.com wrote: Check the number of inodes

Re: Error: No space left on device

2014-07-16 Thread Chris Gore
Hi Chris, I've encountered this error when running Spark’s ALS methods too. In my case, it was because I set spark.local.dir improperly, and every time there was a shuffle, it would spill many GB of data onto the local drive. What fixed it was setting it to use the /mnt directory, where a

Re: Error: No space left on device

2014-07-16 Thread Chris DuBois
Thanks for the quick responses! I used your final -Dspark.local.dir suggestion, but I see this during the initialization of the application: 14/07/16 06:56:08 INFO storage.DiskBlockManager: Created local directory at /vol/spark-local-20140716065608-7b2a I would have expected something in

Re: Error: No space left on device

2014-07-16 Thread Xiangrui Meng
Hi Chris, Could you also try `df -i` on the master node? How many blocks/partitions did you set? In the current implementation, ALS doesn't clean the shuffle data because the operations are chained together. But it shouldn't run out of disk space on the MovieLens dataset, which is small.

Re: Error: No space left on device

2014-07-16 Thread Chris DuBois
Hi Xiangrui, Here is the result on the master node: $ df -i FilesystemInodes IUsed IFree IUse% Mounted on /dev/xvda1524288 273997 250291 53% / tmpfs1917974 1 19179731% /dev/shm /dev/xvdv524288000 30 5242879701% /vol I

Re: Error: No space left on device

2014-07-16 Thread Chris DuBois
Hi Xiangrui, I accidentally did not send df -i for the master node. Here it is at the moment of failure: FilesystemInodes IUsed IFree IUse% Mounted on /dev/xvda1524288 280938 243350 54% / tmpfs3845409 1 38454081% /dev/shm /dev/xvdb

Re: Error: No space left on device

2014-07-16 Thread Chris DuBois
Hi Xiangrui, I will try this shortly. When using N partitions, do you recommend N be the number of cores on each slave or the number of cores on the master? Forgive my ignorance, but is this best achieved as an argument to sc.textFile? The slaves on the EC2 clusters start with only 8gb of