Got a same problem here. my bulk load job failed due to lack of memory at reduce phase. 15M rows each day on a Phoenix table with a additional index.
After all, I recreated my tables with salting. It helps a lot because the bulk load job launched same number of reducer with salt buckets. But, I believe, if you're bulk loading large dataset, it would fail. AFAIK, current MR based bulk loading requires a lot of memory for writing target files. - Youngwoo On Sat, Dec 19, 2015 at 5:35 AM, Cox, Jonathan A <ja...@sandia.gov> wrote: > Hi Gabriel, > > The Hadoop version is 2.6.2. > > -Jonathan > > -----Original Message----- > From: Gabriel Reid [mailto:gabriel.r...@gmail.com] > Sent: Friday, December 18, 2015 11:58 AM > To: user@phoenix.apache.org > Subject: Re: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool > > Hi Jonathan, > > Which Hadoop version are you using? I'm actually wondering if > mapred.child.java.opts is still supported in Hadoop 2.x (I think it has > been replaced by mapreduce.map.java.opts and mapreduce.reduce.java.opts). > > The HADOOP_CLIENT_OPTS won't make a difference if you're running in > (pseudo) distributed mode, as separate JVMs will be started up for the > tasks. > > - Gabriel > > > On Fri, Dec 18, 2015 at 7:33 PM, Cox, Jonathan A <ja...@sandia.gov> wrote: > > Gabriel, > > > > I am running the job on a single machine in pseudo distributed mode. > I've set the max Java heap size in two different ways (just to be sure): > > > > export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx48g" > > > > and also in mapred-site.xml: > > <property> > > <name>mapred.child.java.opts</name> > > <value>-Xmx48g</value> > > </property> > > > > -----Original Message----- > > From: Gabriel Reid [mailto:gabriel.r...@gmail.com] > > Sent: Friday, December 18, 2015 8:17 AM > > To: user@phoenix.apache.org > > Subject: [EXTERNAL] Re: Java Out of Memory Errors with CsvBulkLoadTool > > > > Hi Jonathan, > > > > Sounds like something is very wrong here. > > > > Are you running the job on an actual cluster, or are you using the local > job tracker (i.e. running the import job on a single computer). > > > > Normally an import job, regardless of the size of the input, should run > with map and reduce tasks that have a standard (e.g. 2GB) heap size per > task (although there will typically be multiple tasks started on the > cluster). There shouldn't be any need to have anything like a 48GB heap. > > > > If you are running this on an actual cluster, could you elaborate on > where/how you're setting the 48GB heap size setting? > > > > - Gabriel > > > > > > On Fri, Dec 18, 2015 at 1:46 AM, Cox, Jonathan A <ja...@sandia.gov> > wrote: > >> I am trying to ingest a 575MB CSV file with 192,444 lines using the > >> CsvBulkLoadTool MapReduce job. When running this job, I find that I > >> have to boost the max Java heap space to 48GB (24GB fails with Java > >> out of memory errors). > >> > >> > >> > >> I’m concerned about scaling issues. It seems like it shouldn’t > >> require between 24-48GB of memory to ingest a 575MB file. However, I > >> am pretty new to Hadoop/HBase/Phoenix, so maybe I am off base here. > >> > >> > >> > >> Can anybody comment on this observation? > >> > >> > >> > >> Thanks, > >> > >> Jonathan >