RE: RowCounter example run time

Michael Segel Sun, 23 May 2010 07:37:20 -0700

J-D,

Here's the problem.. you go to any relational database and do a select count(*) 
and you get a response back fairly quickly.
The difference is that in HBase, you're doing a physical count and with the 
relational engine you're pulling it from meta data.


I have a couple of ideas on how we could do this...

-Mike

> Date: Sat, 22 May 2010 09:25:51 -0700
> Subject: Re: RowCounter example run time
> From: [email protected]
> To: [email protected]
> 
> My first question would be, what do you expect exactly? Would 5 min be
> enough? Or are you expecting something more like 1-2 secs (which is
> impossible since this is mapreduce)?
> 
> Then there's also Jon's questions.
> 
> Finally, did you set a higher scanner caching on that job?
> hbase.client.scanner.caching is the name of the config, which defaults
> to 1. When mapping a HBase table, if you don't set it higher you're
> basically benchmarking the RPC layer since it does 1 call per next()
> invocation. Setting the right value depends on the size of your rows
> eg are you storing 60 bytes or something high like 100KB? On our 13B
> rows table (each row is a few bytes), we set it to 10k.
> 
> J-D
> 
> On Sat, May 22, 2010 at 8:40 AM, Andrew Nguyen
> <[email protected]> wrote:
> > Hello,
> >
> > I finally got some decent hardware to put together a 1 master, 4 slave 
> > Hadoop/HBase cluster.  However, I'm still waiting for space in the 
> > datacenter to clear out and only have 3 of the nodes deployed (master + 2 
> > slaves).  Each node is a quad-core AMD with 8G of RAM, running on a GigE 
> > network.  HDFS is configured to run on a separate (from the OS drive) U320 
> > drive.  The master has RAID1 mirrored drives only.
> >
> > I've installed HBase with slave1 and slave2 as regionservers and master, 
> > slave1, slave2 as the ZK quorom.  The master serves as the NN and JT and 
> > the slaves as DN and TT.
> >
> > Now my question:
> >
> > I've imported 22.5M rows into HBase, into a single table.  Each row has 8 
> > or so columns.  I just ran the RowCounter MR example and it takes about 25 
> > minutes to complete.  Is a 3 node setup too underpowered to combat the 
> > overhead of Hadoop and HBase?  Or, could it be something with my 
> > configuration?  I've been playing around with Hadoop some but this is my 
> > first attempt at anything HBase.
> >
> > Thanks!
> >
> > --Andrew
                                          
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

RE: RowCounter example run time

Reply via email to