spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Kostas Kougios
other HDFS directory that contains only 37k files. Any ideas how to resolve this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-timesout-maybe-due-to-binaryFiles-with-more-than-1-million-files-in-HDFS-tp23208.html Sent from the Apache Spark User List ma

RE: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Ewan Leith
e command while wholeTextFiles and had the same error. Ewan -Original Message- From: Kostas Kougios [mailto:kostas.koug...@googlemail.com] Sent: 08 June 2015 15:02 To: user@spark.apache.org Subject: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS I am r

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Konstantinos Kougios
s.koug...@googlemail.com] Sent: 08 June 2015 15:02 To: user@spark.apache.org Subject: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS I am reading millions of xml files via val xmls = sc.binaryFiles(xmlDir) The operation runs fine locally but on yarn it fa

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Konstantinos Kougios
No luck I am afraid. After giving the namenode 16GB of RAM, I am still getting an out of mem exception, kind of different one: 15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw exception: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded at o

RE: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Ewan Leith
ubject: Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS No luck I am afraid. After giving the namenode 16GB of RAM, I am still getting an out of mem exception, kind of different one: 15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw exception: G

Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

2015-06-08 Thread Konstantinos Kougios
inaryFiles() with more than 1 million files in HDFS No luck I am afraid. After giving the namenode 16GB of RAM, I am still getting an out of mem exception, kind of different one: 15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw exception: GC overhead limit exceeded java.lang.OutOfM