Re: Java Out of Memory Errors with CsvBulkLoadTool

Gabriel Reid Fri, 18 Dec 2015 07:17:58 -0800

Hi Jonathan,

Sounds like something is very wrong here.

Are you running the job on an actual cluster, or are you using the
local job tracker (i.e. running the import job on a single computer).

Normally an import job, regardless of the size of the input, should
run with map and reduce tasks that have a standard (e.g. 2GB) heap
size per task (although there will typically be multiple tasks started
on the cluster). There shouldn't be any need to have anything like a
48GB heap.

If you are running this on an actual cluster, could you elaborate on
where/how you're setting the 48GB heap size setting?

- Gabriel

On Fri, Dec 18, 2015 at 1:46 AM, Cox, Jonathan A <ja...@sandia.gov> wrote:
> I am trying to ingest a 575MB CSV file with 192,444 lines using the
> CsvBulkLoadTool MapReduce job. When running this job, I find that I have to
> boost the max Java heap space to 48GB (24GB fails with Java out of memory
> errors).
>
>
>
> I’m concerned about scaling issues. It seems like it shouldn’t require
> between 24-48GB of memory to ingest a 575MB file. However, I am pretty new
> to Hadoop/HBase/Phoenix, so maybe I am off base here.
>
>
>
> Can anybody comment on this observation?
>
>
>
> Thanks,
>
> Jonathan

Re: Java Out of Memory Errors with CsvBulkLoadTool

Reply via email to