Hi Xiangrui,
Thanks. I have taken your advice and set all 5 of my slaves to be
c3.4xlarge. In this case /mnt and /mnt2 have plenty of space by default. I
now do sc.textFile(blah).repartition(N).map(...).cache() with N=80 and
spark.executor.memory to be 20gb and --driver-memory 20g. So far things
Hi,
I also have some issues with repartition. In my program, I consume data
from Kafka. After I consume data, I use repartition(N). However, although I
set N to be 120, there are around 18 executors allocated for my reduce
stage. I am not sure how the repartition command works ton ensure the
Hi all,
I am encountering the following error:
INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No
space left on device [duplicate 4]
For each slave, df -h looks roughtly like this, which makes the above error
surprising.
FilesystemSize Used Avail Use% Mounted
Check the number of inodes (df -i). The assembly build may create many
small files. -Xiangrui
On Tue, Jul 15, 2014 at 11:35 PM, Chris DuBois chris.dub...@gmail.com wrote:
Hi all,
I am encountering the following error:
INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No
df -i # on a slave
FilesystemInodes IUsed IFree IUse% Mounted on
/dev/xvda1524288 277701 246587 53% /
tmpfs1917974 1 19179731% /dev/shm
On Tue, Jul 15, 2014 at 11:39 PM, Xiangrui Meng men...@gmail.com wrote:
Check the number of inodes
Hi Chris,
I've encountered this error when running Spark’s ALS methods too. In my case,
it was because I set spark.local.dir improperly, and every time there was a
shuffle, it would spill many GB of data onto the local drive. What fixed it
was setting it to use the /mnt directory, where a
Thanks for the quick responses!
I used your final -Dspark.local.dir suggestion, but I see this during the
initialization of the application:
14/07/16 06:56:08 INFO storage.DiskBlockManager: Created local directory at
/vol/spark-local-20140716065608-7b2a
I would have expected something in
Hi Chris,
Could you also try `df -i` on the master node? How many
blocks/partitions did you set?
In the current implementation, ALS doesn't clean the shuffle data
because the operations are chained together. But it shouldn't run out
of disk space on the MovieLens dataset, which is small.
Hi Xiangrui,
Here is the result on the master node:
$ df -i
FilesystemInodes IUsed IFree IUse% Mounted on
/dev/xvda1524288 273997 250291 53% /
tmpfs1917974 1 19179731% /dev/shm
/dev/xvdv524288000 30 5242879701% /vol
I
Hi Xiangrui,
I accidentally did not send df -i for the master node. Here it is at the
moment of failure:
FilesystemInodes IUsed IFree IUse% Mounted on
/dev/xvda1524288 280938 243350 54% /
tmpfs3845409 1 38454081% /dev/shm
/dev/xvdb
Hi Xiangrui,
I will try this shortly. When using N partitions, do you recommend N be the
number of cores on each slave or the number of cores on the master? Forgive
my ignorance, but is this best achieved as an argument to sc.textFile?
The slaves on the EC2 clusters start with only 8gb of
11 matches
Mail list logo