Re: Bulk load questions

Nanheng Wu Mon, 27 Dec 2010 13:04:13 -0800

Thanks for the answers. I will use these as my basis for
investigation. I am using a mapper only job, is it better to use the
HBase client to write to HBase or TableOutputFormat?


On Mon, Dec 27, 2010 at 8:38 AM, Stack <[email protected]> wrote:
> On Mon, Dec 27, 2010 at 1:54 AM, Nanheng Wu <[email protected]> wrote:
>> I am running some tests to load data from HDFS into HBase in a MR job.
>> I am pretty new to HBase and I have some questions regarding bulk load
>> performance: I have a small cluster with 4 nodes, I set up one node to
>> run Namenode/JobTracker/ZK, and the other three nodes all run
>> TaskTracker/DataNode/HRegion. During my test I am seeing about 1300
>> inserts per second total and it feels kind of slow.
>
> I don't know what your hardware is like but yeah, it sounds kinda slow.
>
>
> My rows are pretty
>> small ~250 bytes. I am wondering if it is a good idea to be running MR
>> on all nodes. Would it be better if I run MR load job on separate
>> nodes?
>
> Well, where do you think the time is being spent?  What is holding up
> the job do you think?  Is your MR job doing any massaging of the data.
>  Do you have many concurrent mappers run at same time on each node?
> Does your MR job do a map and reduce or just a map?  Is it the insert
> into hbase that is slow?  What do the hbase logs say?  Are they
> blocking because they are flushing memory?
>
> Also I observe that one task tracker's CPU usage was twice as
>> high as the other two.
>
> Maybe its the one that is doing the inserting?  How many regions in
> your hbase cluster?  When you look at hbase UI, is load being spread
> across the hbase cluster or you just hitting one node?
>
> St.Ack
>
>  I can't figure out why that is, does that
>> indicate some hot spots in the cluster? I'd really appreciate some
>> ideas, and please let me know if my description is not specific or
>> detailed enough and what other information I can provide to help
>> diagnose the problem. Thanks!
>>
>

Re: Bulk load questions

Reply via email to