Thanks, did that and now I am getting an out of memory. But I am not
sure where this occurs. It can't be on the spark executor as I have 28GB
allocated to it. It is not the driver because I run this locally and
monitor it via jvisualvm. Unfortunately I can't jmx-monitor hadoop.
From the
Try putting a * on the end of xmlDir, i.e.
xmlDir = fdfs:///abc/def/*
Rather than
xmlDir = Hdfs://abc/def
and see what happens. I don't know why, but that appears to be more reliable
for me with S3 as the filesystem.
I'm also using binaryFiles, but I've tried running the same command while
No luck I am afraid. After giving the namenode 16GB of RAM, I am still
getting an out of mem exception, kind of different one:
15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw
exception: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
Can you do a simple
sc.binaryFiles(hdfs:///path/to/files/*).count()
in the spark-shell and verify that part works?
Ewan
-Original Message-
From: Konstantinos Kougios [mailto:kostas.koug...@googlemail.com]
Sent: 08 June 2015 15:40
To: Ewan Leith; user@spark.apache.org
Subject: Re:
It was giving the same error, which made me figure out it is the driver
but the driver running on hadoop - not the local one. So I did
--conf spark.driver.memory=8g
and now it is processing the files!
Cheers
On 08/06/15 15:52, Ewan Leith wrote:
Can you do a simple