TOF has an HBase client HTable in it.  Its certainly easier using TOF.
 Unless you have special needs, I'd stick w/ TOF.
Good luck,
St.Ack

On Mon, Dec 27, 2010 at 1:03 PM, Nanheng Wu <[email protected]> wrote:
> Thanks for the answers. I will use these as my basis for
> investigation. I am using a mapper only job, is it better to use the
> HBase client to write to HBase or TableOutputFormat?
>
> On Mon, Dec 27, 2010 at 8:38 AM, Stack <[email protected]> wrote:
>> On Mon, Dec 27, 2010 at 1:54 AM, Nanheng Wu <[email protected]> wrote:
>>> I am running some tests to load data from HDFS into HBase in a MR job.
>>> I am pretty new to HBase and I have some questions regarding bulk load
>>> performance: I have a small cluster with 4 nodes, I set up one node to
>>> run Namenode/JobTracker/ZK, and the other three nodes all run
>>> TaskTracker/DataNode/HRegion. During my test I am seeing about 1300
>>> inserts per second total and it feels kind of slow.
>>
>> I don't know what your hardware is like but yeah, it sounds kinda slow.
>>
>>
>> My rows are pretty
>>> small ~250 bytes. I am wondering if it is a good idea to be running MR
>>> on all nodes. Would it be better if I run MR load job on separate
>>> nodes?
>>
>> Well, where do you think the time is being spent?  What is holding up
>> the job do you think?  Is your MR job doing any massaging of the data.
>>  Do you have many concurrent mappers run at same time on each node?
>> Does your MR job do a map and reduce or just a map?  Is it the insert
>> into hbase that is slow?  What do the hbase logs say?  Are they
>> blocking because they are flushing memory?
>>
>> Also I observe that one task tracker's CPU usage was twice as
>>> high as the other two.
>>
>> Maybe its the one that is doing the inserting?  How many regions in
>> your hbase cluster?  When you look at hbase UI, is load being spread
>> across the hbase cluster or you just hitting one node?
>>
>> St.Ack
>>
>>  I can't figure out why that is, does that
>>> indicate some hot spots in the cluster? I'd really appreciate some
>>> ideas, and please let me know if my description is not specific or
>>> detailed enough and what other information I can provide to help
>>> diagnose the problem. Thanks!
>>>
>>
>

Reply via email to