Another thing I have noticed is that out of my master+15 slaves, two
slaves always carry a higher inode load. So for example right now I am
running an intensive job that takes about an hour to finish and two
slaves have been showing an increase in inode consumption (they are
about 10% above the
Patrick, correct. I have a 16 node cluster. On 14 machines out of 16,
the inode usage was about 50%. On two of the slaves, one had inode usage
of 96% and on the other it was 100%. When i went into /tmp on these two
nodes - there were a bunch of /tmp/spark* subdirectories which I
deleted. This r
Ognen - just so I understand. The issue is that there weren't enough
inodes and this was causing a "No space left on device" error? Is that
correct? If so, that's good to know because it's definitely counter
intuitive.
On Sun, Mar 23, 2014 at 8:36 PM, Ognen Duzlevski
wrote:
> I would love to work
I would love to work on this (and other) stuff if I can bother someone
with questions offline or on a dev mailing list.
Ognen
On 3/23/14, 10:04 PM, Aaron Davidson wrote:
Thanks for bringing this up, 100% inode utilization is an issue I
haven't seen raised before and this raises another issue wh
Thanks for bringing this up, 100% inode utilization is an issue I haven't
seen raised before and this raises another issue which is not on our
current roadmap for state cleanup (cleaning up data which was not fully
cleaned up from a crashed process).
On Sun, Mar 23, 2014 at 7:57 PM, Ognen Duzlevs
Bleh, strike that, one of my slaves was at 100% inode utilization on the
file system. It was /tmp/spark* leftovers that apparently did not get
cleaned up properly after failed or interrupted jobs.
Mental note - run a cron job on all slaves and master to clean up
/tmp/spark* regularly.
Thanks (
Aaron, thanks for replying. I am very much puzzled as to what is going
on. A job that used to run on the same cluster is failing with this
mysterious message about not having enough disk space when in fact I can
see through "watch df -h" that the free space is always hovering around
3+GB on the
By default, with P partitions (for both the pre-shuffle stage and
post-shuffle), there are P^2 files created.
With spark.shuffle.consolidateFiles turned on, we would instead create only
P files. Disk space consumption is largely unaffected, however. by the
number of partitions unless each partition
On 3/23/14, 5:49 PM, Matei Zaharia wrote:
You can set spark.local.dir to put this data somewhere other than /tmp
if /tmp is full. Actually it’s recommended to have multiple local
disks and set to to a comma-separated list of directories, one per disk.
Matei, does the number of tasks/partitions i
On 3/23/14, 5:35 PM, Aaron Davidson wrote:
On some systems, /tmp/ is an in-memory tmpfs file system, with its own
size limit. It's possible that this limit has been exceeded. You might
try running the "df" command to check to free space of "/tmp" or root
if tmp isn't listed.
3 GB also seems
You can set spark.local.dir to put this data somewhere other than /tmp if /tmp
is full. Actually it’s recommended to have multiple local disks and set to to a
comma-separated list of directories, one per disk.
Matei
On Mar 23, 2014, at 3:35 PM, Aaron Davidson wrote:
> On some systems, /tmp/ i
On some systems, /tmp/ is an in-memory tmpfs file system, with its own size
limit. It's possible that this limit has been exceeded. You might try
running the "df" command to check to free space of "/tmp" or root if tmp
isn't listed.
3 GB also seems pretty low for the remaining free space of a disk
Hello,
I have a weird error showing up when I run a job on my Spark cluster.
The version of spark is 0.9 and I have 3+ GB free on the disk when this
error shows up. Any ideas what I should be looking for?
[error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
167.0:3 failed
13 matches
Mail list logo