If you're running on Ubuntu, do ulimit -n, which gives the max number of allowed open files. You will have to change the value in /etc/security/limits.conf to something like 10000, logout and log back in.
Thanks, Ron Sent from my iPad > On Aug 10, 2014, at 10:19 PM, Davies Liu <dav...@databricks.com> wrote: > >> On Fri, Aug 8, 2014 at 9:12 AM, Baoqiang Cao <bqcaom...@gmail.com> wrote: >> Hi There >> >> I ran into a problem and can’t find a solution. >> >> I was running bin/pyspark < ../python/wordcount.py > > you could use bin/spark-submit ../python/wordcount.py > >> The wordcount.py is here: >> >> ======================================== >> import sys >> from operator import add >> >> from pyspark import SparkContext >> >> datafile = '/mnt/data/m1.txt' >> >> sc = SparkContext() >> outfile = datafile + '.freq' >> lines = sc.textFile(datafile, 1) >> counts = lines.flatMap(lambda x: x.split(' ')) \ >> .map(lambda x: (x, 1)) \ >> .reduceByKey(add) >> output = counts.collect() >> >> outf = open(outfile, 'w') >> >> for (word, count) in output: >> outf.write(word.encode('utf-8') + '\t' + str(count) + '\n') >> outf.close() >> ======================================== >> >> >> The error message is here: >> >> 14/08/08 16:01:59 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 0) >> java.io.FileNotFoundException: >> /tmp/spark-local-20140808160150-d36b/12/shuffle_0_0_468 (Too many open >> files) > > This message means that the Spark (JVM) had reach the max number of open > files, > there are fd leak some where, unfortunately I can not reproduce this > problem. What > is the version of Spark? > >> at java.io.FileOutputStream.open(Native Method) >> at java.io.FileOutputStream.<init>(FileOutputStream.java:221) >> at >> org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:107) >> at >> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:175) >> at >> org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67) >> at >> org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >> at >> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:54) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:744) >> >> >> The m1.txt is about 4G, and I have >120GB Ram and used -Xmx120GB. It is on >> Ubuntu. Any help please? >> >> Best >> Baoqiang Cao >> Blog: http://baoqiang.org >> Email: bqcaom...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org