Ops,the last reply didn't go to the user list.  Mail app's fault.

Shuffling happens in the cluster, so you need change all the nodes in the 
cluster.



Sent from my iPhone

> On 2014年8月30日, at 3:10, Sudha Krishna <skrishna...@gmail.com> wrote:
> 
> Hi,
> 
> Thanks for your response. Do you know if I need to change this limit on all 
> the cluster nodes or just the master?
> Thanks
> 
>> On Aug 29, 2014 11:43 AM, "Ye Xianjin" <advance...@gmail.com> wrote:
>> 1024 for the number of file limit is most likely too small for Linux 
>> Machines on production. Try to set to 65536 or unlimited if you can. The too 
>> many open files error occurs because there are a lot of shuffle files(if 
>> wrong, please correct me):
>> 
>> Sent from my iPhone
>> 
>> > On 2014年8月30日, at 2:06, SK <skrishna...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > I am having the same problem reported by Michael. I am trying to open 30
>> > files. ulimit -n  shows the limit is 1024. So I am not sure why the program
>> > is failing with  "Too many open files" error. The total size of all the 30
>> > files is 230 GB.
>> > I am running the job on a cluster with 10 nodes, each having 16 GB. The
>> > error appears to be happening at the distinct() stage.
>> >
>> > Here is my program. In the following code, are all the 10 nodes trying to
>> > open all of the 30 files or are the files distributed among the 30 nodes?
>> >
>> >                        val baseFile = "/mapr/mapr_dir/files_2013apr*"
>> >                        val    x = sc.textFile(baseFile)).map { line =>
>> >                                                                    val
>> > fields = line.split("\t")
>> >
>> > (fields(11), fields(6))
>> >
>> > }.distinct().countByKey()
>> >                        val xrdd = sc.parallelize(x.toSeq)
>> >                        xrdd.saveAsTextFile(...)
>> >
>> > Instead of using the glob *, I guess I can try using a for loop to read the
>> > files one by one if that helps, but not sure if there is a more efficient
>> > solution.
>> >
>> > The following is the error transcript:
>> >
>> > Job aborted due to stage failure: Task 1.0:201 failed 4 times, most recent
>> > failure: Exception failure in TID 902 on host 192.168.13.11:
>> > java.io.FileNotFoundException:
>> > /tmp/spark-local-20140829131200-0bb7/08/shuffle_0_201_999 (Too many open
>> > files)
>> > java.io.FileOutputStream.open(Native Method)
>> > java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>> > org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:116)
>> > org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:177)
>> > org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:161)
>> > org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:158)
>> > scala.collection.Iterator$class.foreach(Iterator.scala:727)
>> > org.apache.spark.util.collection.AppendOnlyMap$$anon$1.foreach(AppendOnlyMap.scala:159)
>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>> > org.apache.spark.scheduler.Task.run(Task.scala:51)
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> > java.lang.Thread.run(Thread.java:744) Driver stacktrace:
>> >
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context: 
>> > http://apache-spark-user-list.1001560.n3.nabble.com/Too-many-open-files-tp1464p13144.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >

Reply via email to