Re: error with pyspark

Ron Gonzalez Mon, 11 Aug 2014 01:09:49 -0700

If you're running on Ubuntu, do ulimit -n, which gives the max number of 
allowed open files. You will have to change the value in 
/etc/security/limits.conf to something like 10000, logout and log back in.


Thanks,
Ron

Sent from my iPad

> On Aug 10, 2014, at 10:19 PM, Davies Liu <dav...@databricks.com> wrote:
> 
>> On Fri, Aug 8, 2014 at 9:12 AM, Baoqiang Cao <bqcaom...@gmail.com> wrote:
>> Hi There
>> 
>> I ran into a problem and can’t find a solution.
>> 
>> I was running bin/pyspark < ../python/wordcount.py
> 
> you could use bin/spark-submit  ../python/wordcount.py
> 
>> The wordcount.py is here:
>> 
>> ========================================
>> import sys
>> from operator import add
>> 
>> from pyspark import SparkContext
>> 
>> datafile = '/mnt/data/m1.txt'
>> 
>> sc = SparkContext()
>> outfile = datafile + '.freq'
>> lines = sc.textFile(datafile, 1)
>> counts = lines.flatMap(lambda x: x.split(' ')) \
>>                    .map(lambda x: (x, 1)) \
>>                    .reduceByKey(add)
>> output = counts.collect()
>> 
>> outf = open(outfile, 'w')
>> 
>> for (word, count) in output:
>>   outf.write(word.encode('utf-8') + '\t' + str(count) + '\n')
>> outf.close()
>> ========================================
>> 
>> 
>> The error message is here:
>> 
>> 14/08/08 16:01:59 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 0)
>> java.io.FileNotFoundException:
>> /tmp/spark-local-20140808160150-d36b/12/shuffle_0_0_468 (Too many open
>> files)
> 
> This message means that the Spark (JVM) had reach  the max number of open 
> files,
> there are fd leak some where, unfortunately I can not reproduce this
> problem.  What
> is the version of Spark?
> 
>>        at java.io.FileOutputStream.open(Native Method)
>>        at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>>        at
>> org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:107)
>>        at
>> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:175)
>>        at
>> org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
>>        at
>> org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
>>        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>        at
>> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
>>        at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>        at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>        at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>        at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:744)
>> 
>> 
>> The m1.txt is about 4G, and I have >120GB Ram and used -Xmx120GB. It is on
>> Ubuntu. Any help please?
>> 
>> Best
>> Baoqiang Cao
>> Blog: http://baoqiang.org
>> Email: bqcaom...@gmail.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: error with pyspark

Reply via email to