Hi Jonathan, Sounds like something is very wrong here.
Are you running the job on an actual cluster, or are you using the local job tracker (i.e. running the import job on a single computer). Normally an import job, regardless of the size of the input, should run with map and reduce tasks that have a standard (e.g. 2GB) heap size per task (although there will typically be multiple tasks started on the cluster). There shouldn't be any need to have anything like a 48GB heap. If you are running this on an actual cluster, could you elaborate on where/how you're setting the 48GB heap size setting? - Gabriel On Fri, Dec 18, 2015 at 1:46 AM, Cox, Jonathan A <ja...@sandia.gov> wrote: > I am trying to ingest a 575MB CSV file with 192,444 lines using the > CsvBulkLoadTool MapReduce job. When running this job, I find that I have to > boost the max Java heap space to 48GB (24GB fails with Java out of memory > errors). > > > > I’m concerned about scaling issues. It seems like it shouldn’t require > between 24-48GB of memory to ingest a 575MB file. However, I am pretty new > to Hadoop/HBase/Phoenix, so maybe I am off base here. > > > > Can anybody comment on this observation? > > > > Thanks, > > Jonathan