Hi Steven,

I did 1) and 2) and the error was during LoadIncrementalHFiles.

I can't do 3) because that CF is mostly used for mapreduce inputs, so a
continuous rowkey is preferred.

Jianshi



On Sat, Sep 6, 2014 at 12:29 AM, Magana-zook, Steven Alan <
maganazo...@llnl.gov> wrote:

> Jianshi,
>
> I have seen many solutions to importing this kind of data:
>
> 1. Pre-splitting regions (I did not try this)
>
> 2. Using a map reduce job to create HFiles instead of putting individual
> rows into the database
> (instructions here: http://hbase.apache.org/book/arch.bulk.load.html
>
> 3. Modifying the row key to not be monotonic
>
> I went with the third solution by pre-prending a random integer before the
> other fields in my composite row key ( "<random int>_<key field 1>_<key
> field 2>Š.")
>
> When you make any changes, you can verify it is working by viewing the
> Hbase web interface (port 60010 on the hbase master) to see the requests
> per second on the various region servers.
>
>
> Thank you,
> Steven Magana-Zook
>
>
>
>
>
>
> On 9/5/14 9:14 AM, "Jianshi Huang" <jianshi.hu...@gmail.com> wrote:
>
> >Thanks Ted, I'll try to do a major compact.
> >
> >Hi Steven,
> >
> >Yes, most of my rows are hashed to make it randomly distributed, but one
> >column family has monotonically increasing rowkeys, and it's used for
> >recording sequence of events.
> >
> >Do you have a solution how to bulk import this kind of data?
> >
> >Jianshi
> >
> >
> >
> >On Sat, Sep 6, 2014 at 12:00 AM, Magana-zook, Steven Alan <
> >maganazo...@llnl.gov> wrote:
> >
> >> Hi Jianshi,
> >>
> >> What are the field(s) in your row key? If your row key is monotonically
> >> increasing then you will be sending all of your requests to one region
> >> server. Even after the region splits, all new entries will keep
> >>punishing
> >> one server (the region responsible for the split containing the new
> >>keys).
> >>
> >> See these articles that may help if this is indeed your issue:
> >> 1. http://hbase.apache.org/book/rowkey.design.html
> >> 2.
> >>
> >>
> http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-inc
> >>re
> >> asing-values-are-bad/
> >>
> >> Regards,
> >> Steven Magana-Zook
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 9/5/14 8:54 AM, "Jianshi Huang" <jianshi.hu...@gmail.com> wrote:
> >>
> >> >Hi JM,
> >> >
> >> >What do you mean by the 'destination cluster'? The files are in the
> >>same
> >> >Hadoop/HDFS cluster where HBase is running.
> >> >
> >> >Do you mean do the bulk importing on HBase Master node?
> >> >
> >> >
> >> >Jianshi
> >> >
> >> >
> >> >On Fri, Sep 5, 2014 at 11:18 PM, Jean-Marc Spaggiari <
> >> >jean-m...@spaggiari.org> wrote:
> >> >
> >> >> Hi Jianshi,
> >> >>
> >> >> You might want to upload the file on the destination cluster first
> >>and
> >> >>then
> >> >> re-run your bulk load from there. That way the transfer time will
> >>not be
> >> >> taken into consideration for the timeout size the files will be
> >>local.
> >> >>
> >> >> JM
> >> >>
> >> >>
> >> >> 2014-09-05 11:15 GMT-04:00 Jianshi Huang <jianshi.hu...@gmail.com>:
> >> >>
> >> >> > I'm importing 2TB of generated HFiles to HBase and I constantly get
> >> >>the
> >> >> > following errors:
> >> >> >
> >> >> > Caused by:
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop
> >>>>.h
> >> >>base.RegionTooBusyException):
> >> >> > org.apache.hadoop.hbase.RegionTooBusyException: failed to get a
> >>lock
> >> >>in
> >> >> > 60000 ms.
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>regionName=grapple_edges_v2,ff000000,1409817320781.6d2955c780b39523de73
> >>>>3f
> >> >>3565642d96.,
> >> >> > server=xxxxx.xxx.xxx,60020,1404854700728
> >> >> >         at
> >> >> >
> >>org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5851)
> >> >> >         at
> >> >> >
> >>org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5837)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.regionserver.HRegion.startBulkRegionOperation(H
> >>>>Re
> >> >>gion.java:5795)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.jav
> >>>>a:
> >> >>3543)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.jav
> >>>>a:
> >> >>3525)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFile(HRegio
> >>>>nS
> >> >>erver.java:3277)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> >>>>.c
> >> >>allBlockingMethod(ClientProtos.java:28863)
> >> >> >         at
> >> >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
> >> >> >         at
> >> >>org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> >>>>he
> >> >>duler.java:160)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> >>>>du
> >> >>ler.java:38)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> >>>>.j
> >> >>ava:110)
> >> >> >         at java.lang.Thread.run(Thread.java:724)
> >> >> >
> >> >> >         at
> >> >> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1498)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java
> >>>>:1
> >> >>684)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.
> >>>>ca
> >> >>llBlockingMethod(RpcClient.java:1737)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$B
> >>>>lo
> >> >>ckingStub.bulkLoadHFile(ClientProtos.java:29276)
> >> >> >         at
> >> >> >
> >> >> >
> >> >>
> >>
> >>>>org.apache.hadoop.hbase.protobuf.ProtobufUtil.bulkLoadHFile(ProtobufUti
> >>>>l.
> >> >>java:1548)
> >> >> >         ... 11 more
> >> >> >
> >> >> >
> >> >> > What makes the region too busy? Is there a way to improve it?
> >> >> >
> >> >> > Does that also mean some part of my data are not correctly
> >>imported?
> >> >> >
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > --
> >> >> > Jianshi Huang
> >> >> >
> >> >> > LinkedIn: jianshi
> >> >> > Twitter: @jshuang
> >> >> > Github & Blog: http://huangjs.github.com/
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> >--
> >> >Jianshi Huang
> >> >
> >> >LinkedIn: jianshi
> >> >Twitter: @jshuang
> >> >Github & Blog: http://huangjs.github.com/
> >>
> >>
> >
> >
> >--
> >Jianshi Huang
> >
> >LinkedIn: jianshi
> >Twitter: @jshuang
> >Github & Blog: http://huangjs.github.com/
>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to