Hi Steven, I did 1) and 2) and the error was during LoadIncrementalHFiles.
I can't do 3) because that CF is mostly used for mapreduce inputs, so a continuous rowkey is preferred. Jianshi On Sat, Sep 6, 2014 at 12:29 AM, Magana-zook, Steven Alan < maganazo...@llnl.gov> wrote: > Jianshi, > > I have seen many solutions to importing this kind of data: > > 1. Pre-splitting regions (I did not try this) > > 2. Using a map reduce job to create HFiles instead of putting individual > rows into the database > (instructions here: http://hbase.apache.org/book/arch.bulk.load.html > > 3. Modifying the row key to not be monotonic > > I went with the third solution by pre-prending a random integer before the > other fields in my composite row key ( "<random int>_<key field 1>_<key > field 2>Š.") > > When you make any changes, you can verify it is working by viewing the > Hbase web interface (port 60010 on the hbase master) to see the requests > per second on the various region servers. > > > Thank you, > Steven Magana-Zook > > > > > > > On 9/5/14 9:14 AM, "Jianshi Huang" <jianshi.hu...@gmail.com> wrote: > > >Thanks Ted, I'll try to do a major compact. > > > >Hi Steven, > > > >Yes, most of my rows are hashed to make it randomly distributed, but one > >column family has monotonically increasing rowkeys, and it's used for > >recording sequence of events. > > > >Do you have a solution how to bulk import this kind of data? > > > >Jianshi > > > > > > > >On Sat, Sep 6, 2014 at 12:00 AM, Magana-zook, Steven Alan < > >maganazo...@llnl.gov> wrote: > > > >> Hi Jianshi, > >> > >> What are the field(s) in your row key? If your row key is monotonically > >> increasing then you will be sending all of your requests to one region > >> server. Even after the region splits, all new entries will keep > >>punishing > >> one server (the region responsible for the split containing the new > >>keys). > >> > >> See these articles that may help if this is indeed your issue: > >> 1. http://hbase.apache.org/book/rowkey.design.html > >> 2. > >> > >> > http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-inc > >>re > >> asing-values-are-bad/ > >> > >> Regards, > >> Steven Magana-Zook > >> > >> > >> > >> > >> > >> > >> On 9/5/14 8:54 AM, "Jianshi Huang" <jianshi.hu...@gmail.com> wrote: > >> > >> >Hi JM, > >> > > >> >What do you mean by the 'destination cluster'? The files are in the > >>same > >> >Hadoop/HDFS cluster where HBase is running. > >> > > >> >Do you mean do the bulk importing on HBase Master node? > >> > > >> > > >> >Jianshi > >> > > >> > > >> >On Fri, Sep 5, 2014 at 11:18 PM, Jean-Marc Spaggiari < > >> >jean-m...@spaggiari.org> wrote: > >> > > >> >> Hi Jianshi, > >> >> > >> >> You might want to upload the file on the destination cluster first > >>and > >> >>then > >> >> re-run your bulk load from there. That way the transfer time will > >>not be > >> >> taken into consideration for the timeout size the files will be > >>local. > >> >> > >> >> JM > >> >> > >> >> > >> >> 2014-09-05 11:15 GMT-04:00 Jianshi Huang <jianshi.hu...@gmail.com>: > >> >> > >> >> > I'm importing 2TB of generated HFiles to HBase and I constantly get > >> >>the > >> >> > following errors: > >> >> > > >> >> > Caused by: > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop > >>>>.h > >> >>base.RegionTooBusyException): > >> >> > org.apache.hadoop.hbase.RegionTooBusyException: failed to get a > >>lock > >> >>in > >> >> > 60000 ms. > >> >> > > >> >> > > >> >> > >> > >>>>regionName=grapple_edges_v2,ff000000,1409817320781.6d2955c780b39523de73 > >>>>3f > >> >>3565642d96., > >> >> > server=xxxxx.xxx.xxx,60020,1404854700728 > >> >> > at > >> >> > > >>org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5851) > >> >> > at > >> >> > > >>org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5837) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.regionserver.HRegion.startBulkRegionOperation(H > >>>>Re > >> >>gion.java:5795) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.jav > >>>>a: > >> >>3543) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.jav > >>>>a: > >> >>3525) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFile(HRegio > >>>>nS > >> >>erver.java:3277) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2 > >>>>.c > >> >>allBlockingMethod(ClientProtos.java:28863) > >> >> > at > >> >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) > >> >> > at > >> >>org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc > >>>>he > >> >>duler.java:160) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche > >>>>du > >> >>ler.java:38) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler > >>>>.j > >> >>ava:110) > >> >> > at java.lang.Thread.run(Thread.java:724) > >> >> > > >> >> > at > >> >> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1498) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java > >>>>:1 > >> >>684) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation. > >>>>ca > >> >>llBlockingMethod(RpcClient.java:1737) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$B > >>>>lo > >> >>ckingStub.bulkLoadHFile(ClientProtos.java:29276) > >> >> > at > >> >> > > >> >> > > >> >> > >> > >>>>org.apache.hadoop.hbase.protobuf.ProtobufUtil.bulkLoadHFile(ProtobufUti > >>>>l. > >> >>java:1548) > >> >> > ... 11 more > >> >> > > >> >> > > >> >> > What makes the region too busy? Is there a way to improve it? > >> >> > > >> >> > Does that also mean some part of my data are not correctly > >>imported? > >> >> > > >> >> > > >> >> > Thanks, > >> >> > > >> >> > -- > >> >> > Jianshi Huang > >> >> > > >> >> > LinkedIn: jianshi > >> >> > Twitter: @jshuang > >> >> > Github & Blog: http://huangjs.github.com/ > >> >> > > >> >> > >> > > >> > > >> > > >> >-- > >> >Jianshi Huang > >> > > >> >LinkedIn: jianshi > >> >Twitter: @jshuang > >> >Github & Blog: http://huangjs.github.com/ > >> > >> > > > > > >-- > >Jianshi Huang > > > >LinkedIn: jianshi > >Twitter: @jshuang > >Github & Blog: http://huangjs.github.com/ > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/