Yes, the fileinputstream is closed. May be i didn't show in the screen shot .
As spark implements, sort-based shuffle, there is a parameter called maximum merge factor which decides the number of files that can be merged at once and this avoids too many open files. I am suspecting that it is something related to this. Can someone confirm on this ? On Tue, Jan 5, 2016 at 11:19 PM, Annabel Melongo <melongo_anna...@yahoo.com> wrote: > Vijay, > > Are you closing the fileinputstream at the end of each loop ( in.close())? > My guess is those streams aren't close and thus the "too many open files" > exception. > > > On Tuesday, January 5, 2016 8:03 AM, Priya Ch < > learnings.chitt...@gmail.com> wrote: > > > Can some one throw light on this ? > > Regards, > Padma Ch > > On Mon, Dec 28, 2015 at 3:59 PM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > > Chris, we are using spark 1.3.0 version. we have not set > spark.streaming.concurrentJobs > this parameter. It takes the default value. > > Vijay, > > From the tack trace it is evident that > org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$1.apply$mcVI$sp(ExternalSorter.scala:730) > is throwing the exception. I opened the spark source code and visited the > line which is throwing this exception i.e > > [image: Inline image 1] > > The lie which is marked in red is throwing the exception. The file is > ExternalSorter.scala in org.apache.spark.util.collection package. > > i went through the following blog > http://blog.cloudera.com/blog/2015/01/improving-sort-performance-in-apache-spark-its-a-double/ > and understood that there is merge factor which decide the number of > on-disk files that could be merged. Is it some way related to this ? > > Regards, > Padma CH > > On Fri, Dec 25, 2015 at 7:51 PM, Chris Fregly <ch...@fregly.com> wrote: > > and which version of Spark/Spark Streaming are you using? > > are you explicitly setting the spark.streaming.concurrentJobs to > something larger than the default of 1? > > if so, please try setting that back to 1 and see if the problem still > exists. > > this is a dangerous parameter to modify from the default - which is why > it's not well-documented. > > > On Wed, Dec 23, 2015 at 8:23 AM, Vijay Gharge <vijay.gha...@gmail.com> > wrote: > > Few indicators - > > 1) during execution time - check total number of open files using lsof > command. Need root permissions. If it is cluster not sure much ! > 2) which exact line in the code is triggering this error ? Can you paste > that snippet ? > > > On Wednesday 23 December 2015, Priya Ch <learnings.chitt...@gmail.com> > wrote: > > ulimit -n 65000 > > fs.file-max = 65000 ( in etc/sysctl.conf file) > > Thanks, > Padma Ch > > On Tue, Dec 22, 2015 at 6:47 PM, Yash Sharma <yash...@gmail.com> wrote: > > Could you share the ulimit for your setup please ? > - Thanks, via mobile, excuse brevity. > On Dec 22, 2015 6:39 PM, "Priya Ch" <learnings.chitt...@gmail.com> wrote: > > Jakob, > > Increased the settings like fs.file-max in /etc/sysctl.conf and also > increased user limit in /etc/security/limits.conf. But still see the same > issue. > > On Fri, Dec 18, 2015 at 12:54 AM, Jakob Odersky <joder...@gmail.com> > wrote: > > It might be a good idea to see how many files are open and try increasing > the open file limit (this is done on an os level). In some application > use-cases it is actually a legitimate need. > > If that doesn't help, make sure you close any unused files and streams in > your code. It will also be easier to help diagnose the issue if you send an > error-reproducing snippet. > > > > > > -- > Regards, > Vijay Gharge > > > > > > > -- > > *Chris Fregly* > Principal Data Solutions Engineer > IBM Spark Technology Center, San Francisco, CA > http://spark.tc | http://advancedspark.com > > > > > >