Hi Jonathan,

Sounds like something is very wrong here.

Are you running the job on an actual cluster, or are you using the
local job tracker (i.e. running the import job on a single computer).

Normally an import job, regardless of the size of the input, should
run with map and reduce tasks that have a standard (e.g. 2GB) heap
size per task (although there will typically be multiple tasks started
on the cluster). There shouldn't be any need to have anything like a
48GB heap.

If you are running this on an actual cluster, could you elaborate on
where/how you're setting the 48GB heap size setting?

- Gabriel


On Fri, Dec 18, 2015 at 1:46 AM, Cox, Jonathan A <ja...@sandia.gov> wrote:
> I am trying to ingest a 575MB CSV file with 192,444 lines using the
> CsvBulkLoadTool MapReduce job. When running this job, I find that I have to
> boost the max Java heap space to 48GB (24GB fails with Java out of memory
> errors).
>
>
>
> I’m concerned about scaling issues. It seems like it shouldn’t require
> between 24-48GB of memory to ingest a 575MB file. However, I am pretty new
> to Hadoop/HBase/Phoenix, so maybe I am off base here.
>
>
>
> Can anybody comment on this observation?
>
>
>
> Thanks,
>
> Jonathan

Reply via email to