My first question would be, what do you expect exactly? Would 5 min be enough? Or are you expecting something more like 1-2 secs (which is impossible since this is mapreduce)?
Then there's also Jon's questions. Finally, did you set a higher scanner caching on that job? hbase.client.scanner.caching is the name of the config, which defaults to 1. When mapping a HBase table, if you don't set it higher you're basically benchmarking the RPC layer since it does 1 call per next() invocation. Setting the right value depends on the size of your rows eg are you storing 60 bytes or something high like 100KB? On our 13B rows table (each row is a few bytes), we set it to 10k. J-D On Sat, May 22, 2010 at 8:40 AM, Andrew Nguyen <[email protected]> wrote: > Hello, > > I finally got some decent hardware to put together a 1 master, 4 slave > Hadoop/HBase cluster. However, I'm still waiting for space in the datacenter > to clear out and only have 3 of the nodes deployed (master + 2 slaves). Each > node is a quad-core AMD with 8G of RAM, running on a GigE network. HDFS is > configured to run on a separate (from the OS drive) U320 drive. The master > has RAID1 mirrored drives only. > > I've installed HBase with slave1 and slave2 as regionservers and master, > slave1, slave2 as the ZK quorom. The master serves as the NN and JT and the > slaves as DN and TT. > > Now my question: > > I've imported 22.5M rows into HBase, into a single table. Each row has 8 or > so columns. I just ran the RowCounter MR example and it takes about 25 > minutes to complete. Is a 3 node setup too underpowered to combat the > overhead of Hadoop and HBase? Or, could it be something with my > configuration? I've been playing around with Hadoop some but this is my > first attempt at anything HBase. > > Thanks! > > --Andrew
