Limiting number of reducers performance implications

2014-03-29 Thread Matthew Cheah
Hi everyone, I'm using Spark on machines where I can't change the maximum number of open files. As a result, I'm limiting the number of reducers to 500. I'm also only using a single machine that has 32 cores and emulating a cluster by running 4 worker daemons with 8 cores (maximum) each. What I'm

Re: "Too many open files" exception on reduceByKey

2014-03-11 Thread Matthew Cheah
as to what these files are being used for? 2) Is this C*X files being opened on each machine? Also, is C the total number of cores among all machines in the cluster? Thanks, -Matt Cheah On Tue, Mar 11, 2014 at 4:35 PM, Matthew Cheah wrote: > Thanks. Just curious, is there a default number of

Re: "Too many open files" exception on reduceByKey

2014-03-11 Thread Matthew Cheah
ns you'll have to use fewer reducers (e.g. pass reduceByKey a > number of reducers) or use fewer cores on each machine. > > - Patrick > > On Mon, Mar 10, 2014 at 10:41 AM, Matthew Cheah > wrote: > > Hi everyone, > > > > My team (cc'ed in this e-mail) and I a

"Too many open files" exception on reduceByKey

2014-03-10 Thread Matthew Cheah
Hi everyone, My team (cc'ed in this e-mail) and I are running a Spark reduceByKey operation on a cluster of 10 slaves where I don't have the privileges to set "ulimit -n" to a higher number. I'm running on a cluster where "ulimit -n" returns 1024 on each machine. When I attempt to run this job wi