Too many map task trying concurrently commit (save) data to HBase, I bet
you have compaction hell in your cluster during data loading.

In a few words, you cluster is not able to keep up with data ingestion
rate. HBase does not do smart update/insert rate throttling for you. You
may
try some compaction - related configuration options  :
   hbase.hstore.blockingWaitTime - Default. 90000
   hbase.hstore.compaction.min -  Default. 3
   hbase.hstore.compaction.max - Default. 10
   hbase.hstore.compaction.min.size - Default: 128 MB expressed in bytes.

but I suggest you to pre-split your tables first, than limit # of map tasks
(if former does not help), than play with compaction config values (above).


-Vladimir Rodionov

On Thu, Nov 6, 2014 at 12:31 PM, Perko, Ralph J <ralph.pe...@pnnl.gov>
wrote:

>  Hi, I am using a combination of Pig, Phoenix and HBase to load data on a
> test cluster and I continue to run into an issue with larger, longer
> running jobs (smaller jobs succeed).  After the job has run for several
> hours, the first set of mappers have finished and the second begin, the job
> dies with each mapper failing with the error RegionTooBusyException.  Could
> this be related to how I have my Phoenix tables configured or is this an
> Hbase configuration issue or something else?  Do you have any suggestions?
>
>  Thanks for the help,
> Ralph
>
>
>  2014-11-05 23:08:31,573 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 200 actions to
> finish
> 2014-11-05 23:08:33,729 WARN [phoenix-1-thread-34413]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=T1_CSV_DATA,
> primary, attempt=36/35 failed 200 ops, last exception: null on
> server1,60020,1415229553858, tracking started Wed Nov 05 22:59:40 PST 2014;
> not retrying 200 - final failure
> 2014-11-05 23:08:33,736 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.io.IOException: Exception while committing
> to database.
> at
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:79)
> at
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:41)
> at
> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:151)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
> at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.phoenix.execute.CommitException:
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 200 actions: RegionTooBusyException: 200 times,
> at org.apache.phoenix.execute.MutationState.commit(MutationState.java:418)
> at
> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:356)
> at
> org.apache.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:76)
> ... 19 more
> Caused by:
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 200 actions: RegionTooBusyException: 200 times,
> at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
> at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
> at
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1473)
> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:855)
> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:869)
> at org.apache.phoenix.execute.MutationState.commit(MutationState.java:399)
> ... 21 more
>
>  2014-11-05 23:08:33,739 INFO [main] org.apache.hadoop.mapred.Task:
> Runnning cleanup for the task
> 2014-11-05 23:08:33,773 INFO [Thread-11]
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
> Closing zookeeper sessionid=0x2497d0ab7e6007e
>
>  Data size:
> 75 csv files compressed with bz2
> 17g compressed – 165g Uncompressed
>
>  Time-series data, 6 node cluster, 5 region servers.  Hadoop 2.5  (HDP
> 2.1.5).  Phoenix 4.0, Hbase 0.98,
>
>  Phoenix Table def:
>
>  CREATE TABLE IF NOT EXISTS
> t1_csv_data
> (
> timestamp BIGINT NOT NULL,
> location VARCHAR NOT NULL,
> fileid VARCHAR NOT NULL,
> recnum INTEGER NOT NULL,
> field5 VARCHAR,
> ...
> field45 VARCHAR,
> CONSTRAINT pkey PRIMARY KEY (timestamp,
> location, fileid,recnum)
> )
> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10;
>
>  -- indexes
> CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
> COMPRESSION='SNAPPY';
> CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
> COMPRESSION='SNAPPY';
> CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
> COMPRESSION='SNAPPY';
>
>  Simple Pig script:
>
>  register $phoenix_jar;
> register $udf_jar;
>  Z = load '$data' as (
> file_id,
> recnum,
> dtm:chararray,
> ...
> -- lots of other fields
> );
>  D = foreach Z generate
> gov.pnnl.pig.TimeStringToPeriod(dtm,'yyyyMMdd HH:mm:ss','yyyyMMddHHmmss'),
> location,
> fileid,
> recnum,
> ...
> -- lots of other fields
> ;
>  STORE D into
> 'hbase://$table_name' using
> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000');
>
>

Reply via email to