Hey Matt,

The best way is definitely just to increase the ulimit if possible,
this is sort of an assumption we make in Spark that clusters will be
able to move it around.

You might be able to hack around this by decreasing the number of
reducers but this could have some performance implications for your
job.

In general if a node in your cluster has C assigned cores and you run
a job with X reducers then Spark will open C*X files in parallel and
start writing. Shuffle consolidation will help decrease the total
number of files created but the number of file handles open at any
time doesn't change so it won't help the ulimit problem.

This means you'll have to use fewer reducers (e.g. pass reduceByKey a
number of reducers) or use fewer cores on each machine.

- Patrick

On Mon, Mar 10, 2014 at 10:41 AM, Matthew Cheah
<matthew.c.ch...@gmail.com> wrote:
> Hi everyone,
>
> My team (cc'ed in this e-mail) and I are running a Spark reduceByKey
> operation on a cluster of 10 slaves where I don't have the privileges to set
> "ulimit -n" to a higher number. I'm running on a cluster where "ulimit -n"
> returns 1024 on each machine.
>
> When I attempt to run this job with the data originating from a text file,
> stored in an HDFS cluster running on the same nodes as the Spark cluster,
> the job crashes with the message, "Too many open files".
>
> My question is, why are so many files being created, and is there a way to
> configure the Spark context to avoid spawning that many files? I am already
> setting spark.shuffle.consolidateFiles to true.
>
> I want to repeat - I can't change the maximum number of open file
> descriptors on the machines. This cluster is not owned by me and the system
> administrator is responding quite slowly.
>
> Thanks,
>
> -Matt Cheah

Reply via email to