TOF has an HBase client HTable in it. Its certainly easier using TOF. Unless you have special needs, I'd stick w/ TOF. Good luck, St.Ack
On Mon, Dec 27, 2010 at 1:03 PM, Nanheng Wu <[email protected]> wrote: > Thanks for the answers. I will use these as my basis for > investigation. I am using a mapper only job, is it better to use the > HBase client to write to HBase or TableOutputFormat? > > On Mon, Dec 27, 2010 at 8:38 AM, Stack <[email protected]> wrote: >> On Mon, Dec 27, 2010 at 1:54 AM, Nanheng Wu <[email protected]> wrote: >>> I am running some tests to load data from HDFS into HBase in a MR job. >>> I am pretty new to HBase and I have some questions regarding bulk load >>> performance: I have a small cluster with 4 nodes, I set up one node to >>> run Namenode/JobTracker/ZK, and the other three nodes all run >>> TaskTracker/DataNode/HRegion. During my test I am seeing about 1300 >>> inserts per second total and it feels kind of slow. >> >> I don't know what your hardware is like but yeah, it sounds kinda slow. >> >> >> My rows are pretty >>> small ~250 bytes. I am wondering if it is a good idea to be running MR >>> on all nodes. Would it be better if I run MR load job on separate >>> nodes? >> >> Well, where do you think the time is being spent? What is holding up >> the job do you think? Is your MR job doing any massaging of the data. >> Do you have many concurrent mappers run at same time on each node? >> Does your MR job do a map and reduce or just a map? Is it the insert >> into hbase that is slow? What do the hbase logs say? Are they >> blocking because they are flushing memory? >> >> Also I observe that one task tracker's CPU usage was twice as >>> high as the other two. >> >> Maybe its the one that is doing the inserting? How many regions in >> your hbase cluster? When you look at hbase UI, is load being spread >> across the hbase cluster or you just hitting one node? >> >> St.Ack >> >> I can't figure out why that is, does that >>> indicate some hot spots in the cluster? I'd really appreciate some >>> ideas, and please let me know if my description is not specific or >>> detailed enough and what other information I can provide to help >>> diagnose the problem. Thanks! >>> >> >
