I don't think that is the issue. I have it setup to run in a thread pool
but I have set the pool size to 1 for this test until I get this resolved.
I am having some problems with using the Spark web portal since it is
picking a random port and with the way my environment is setup, by time I
have fi
Here is the code in which NewHadoopRDD register close handler and be called
when the task is completed (
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L136
).
>From my understanding, possibly the reason is that this `foreach` code in
your i
ah, now that does sound suspicious...
On 2 Sep 2015, at 14:09, Sigurd Knippenberg
mailto:sig...@knippenberg.com>> wrote:
Yep. I know. It's was set to 32K when I ran this test. If I bump it to 64K the
issue goes away. It still doesn't make sense to me that the Spark job doesn't
release its file
Yep. I know. It's was set to 32K when I ran this test. If I bump it to 64K
the issue goes away. It still doesn't make sense to me that the Spark job
doesn't release its file handles until the end of the job instead of doing
that while my loop iterates.
Sigurd
On Wed, Sep 2, 2015 at 4:33 AM, Steve
On 31 Aug 2015, at 19:49, Sigurd Knippenberg
mailto:sig...@knippenberg.com>> wrote:
I know I can adjust the max open files allowed by the OS but I'd rather fix the
underlaying issue.
bumping up the OS handle limits is step #1 of installing a hadoop cluster
https://wiki.apache.org/hadoop/TooM
I am running in a 'too many open files' issue and before I posted this I
have searched on the web to see if anyone had a solution already to my
particular problem but I did not see anything that helped.
I know I can adjust the max open files allowed by the OS but I'd rather fix