Re: how many severs in a hbase cluster

Kevin O'dell Sun, 23 Jun 2013 06:41:15 -0700

Mohammad,

  The NN is low write, and has pretty static memory usage.  You will see
the NN memory usage go up as you add blocks/files.  Since, HBase has memory
limitations(GC's Fault), and should have ~1 file per store you will not
have a lot of memory pressure on the NN.  The JT is the same way, it scales
up usage based on number of MR jobs.  In a sane HBase environment you are
not going to be running 1000s of MR jobs against HBase.  ZK also has pretty
minimum requirements - 1GB of memory, dedicated CPU core, and place to
write to with low I/O wait.  I have always found the NN, SNN, and JT to be
the next best place to put the ZK if dedicated HW is not available.  I have
seen some strange behavior with ZK runs on DN/TT/RS nodes.  From
unexplained timeouts to corrupt znodes causing failures(This one was real
nasty).



On Sat, Jun 22, 2013 at 7:21 PM, Mohammad Tariq <donta...@gmail.com> wrote:

> Hello Iain,
>
>          You would put a lot of pressure on the RAM if you do that. NN
> already has high memory requirement and then having JT+ZK on the same
> machine would be too heavy, IMHO.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Sun, Jun 23, 2013 at 4:07 AM, iain wright <iainw...@gmail.com> wrote:
>
> > Hi Mohammad,
> >
> > I am curious why you chose not to put the third ZK on the NN+JT? I was
> > planning on doing that on a new cluster and want to confirm it would be
> > okay.
> >
> >
> > --
> > Iain Wright
> > Cell: (562) 852-5916
> >
> > <http://www.labctsi.org/>
> > This email message is confidential, intended only for the recipient(s)
> > named above and may contain information that is privileged, exempt from
> > disclosure under applicable law. If you are not the intended recipient,
> do
> > not disclose or disseminate the message to anyone except the intended
> > recipient. If you have received this message in error, or are not the
> named
> > recipient(s), please immediately notify the sender by return email, and
> > delete all copies of this message.
> >
> >
> > On Sat, Jun 22, 2013 at 10:05 AM, Mohammad Tariq <donta...@gmail.com>
> > wrote:
> >
> > > Yeah, I forgot to mention that no. of ZKs should be odd. Perhaps those
> > > parentheses made that statement look like an optional statement. Just
> to
> > > clarify it was mandatory.
> > >
> > > Warm Regards,
> > > Tariq
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Sat, Jun 22, 2013 at 9:45 PM, Kevin O'dell <
> kevin.od...@cloudera.com
> > > >wrote:
> > >
> > > > If you run ZK with a DN/TT/RS please make sure to dedicate a hard
> drive
> > > and
> > > > a core to the ZK process. I have seen many strange occurrences.
> > > > On Jun 22, 2013 12:10 PM, "Jean-Marc Spaggiari" <
> > jean-m...@spaggiari.org
> > > >
> > > > wrote:
> > > >
> > > > > You HAVE TO run a ZK3, or else you don't need to have ZK2 and any
> ZK
> > > > > failure will be an issue. You need to have an odd number of ZK
> > > > > servers...
> > > > >
> > > > > Also, if you don't run MR jobs, you don't need the TT and JT...
> Else,
> > > > > everything below is correct. But there is many other options, all
> > > > > depend on your needs and the hardware you have ;)
> > > > >
> > > > > JM
> > > > >
> > > > > 2013/6/22 Mohammad Tariq <donta...@gmail.com>:
> > > > > > With 8 machines you can do something like this :
> > > > > >
> > > > > > Machine 1 - NN+JT
> > > > > > Machine 2 - SNN+ZK1
> > > > > > Machine 3 - HM+ZK2
> > > > > > Machine 4-8 - DN+TT+RS
> > > > > > (You can run ZK3 on a slave node with some additional memory).
> > > > > >
> > > > > > DN and RS run on the same machine. Although RSs are said to hold
> > the
> > > > > data,
> > > > > > the data is actually stored in DNs. Replication is managed at
> HDFS
> > > > level.
> > > > > > You don't have to worry about that.
> > > > > >
> > > > > > You can visit this link <
> > > > http://hbase.apache.org/book/perf.writing.html>
> > > > > to
> > > > > > see how to write efficiently into HBase. With a small field there
> > > > should
> > > > > > not be any problem except storage and increased metadata, as
> you'll
> > > > have
> > > > > > many small cells. If possible club several small fields into one
> > and
> > > > put
> > > > > > them together in one cell.
> > > > > >
> > > > > > HTH
> > > > > >
> > > > > > Warm Regards,
> > > > > > Tariq
> > > > > > cloudfront.blogspot.com
> > > > > >
> > > > > >
> > > > > > On Sat, Jun 22, 2013 at 8:31 PM, myhbase <myhb...@126.com>
> wrote:
> > > > > >
> > > > > >> Thanks for your response.
> > > > > >>
> > > > > >> Now if 5 servers are enough, how can I install  and configure my
> > > > nodes?
> > > > > If
> > > > > >> I need 3 replicas in case data loss, I should at least have 3
> > > > > datanodes, we
> > > > > >> still have namenode, regionserver and HMaster nodes, zookeeper
> > > nodes,
> > > > > some
> > > > > >> of them must be installed in the same machine. The datanode
> seems
> > > the
> > > > > disk
> > > > > >> IO sensitive node while region server is the mem sensitive, can
> I
> > > > > install
> > > > > >> them in the same machine? Any suggestion on the deployment plan?
> > > > > >>
> > > > > >> My business requirement is that the write is much more than
> > > read(7:3),
> > > > > and
> > > > > >> I have another concern that I have a field which will have the
> > > 8~15KB
> > > > in
> > > > > >>  data size, I am not sure, there will be any problem in hbase
> when
> > > it
> > > > > runs
> > > > > >> compaction and split in regions.
> > > > > >>
> > > > > >>  Oh, you already have heavyweight's input :).
> > > > > >>>
> > > > > >>> Thanks JM.
> > > > > >>>
> > > > > >>> Warm Regards,
> > > > > >>> Tariq
> > > > > >>> cloudfront.blogspot.com
> > > > > >>>
> > > > > >>>
> > > > > >>> On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq <
> > > donta...@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>  Hello there,
> > > > > >>>>
> > > > > >>>>          IMHO, 5-8 servers are sufficient enough to start
> with.
> > > But
> > > > > it's
> > > > > >>>> all relative to the data you have and the intensity of your
> > > > > reads/writes.
> > > > > >>>> You should have different strategies though, based on whether
> > it's
> > > > > 'read'
> > > > > >>>> or 'write'. You actually can't define 'big' in absolute terms.
> > My
> > > > > cluster
> > > > > >>>> might be big for me, but for someone else it might still be
> not
> > > big
> > > > > >>>> enough
> > > > > >>>> or for someone it might be very big. Long story short it
> depends
> > > on
> > > > > your
> > > > > >>>> needs. If you are able to achieve your goal with 5-8 RSs, then
> > > > having
> > > > > >>>> more
> > > > > >>>> machines will be a wastage, I think.
> > > > > >>>>
> > > > > >>>> But you should always keep in mind that HBase is kinda greedy
> > when
> > > > it
> > > > > >>>> comes to memory. For a decent load 4G is sufficient, IMHO. But
> > it
> > > > > again
> > > > > >>>> depends on operations you are gonna perform. If you have large
> > > > > clusters
> > > > > >>>> where you are planning to run MR jobs frequently you are
> better
> > > off
> > > > > with
> > > > > >>>> additional 2G.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Warm Regards,
> > > > > >>>> Tariq
> > > > > >>>> cloudfront.blogspot.com
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Sat, Jun 22, 2013 at 7:51 PM, myhbase <myhb...@126.com>
> > wrote:
> > > > > >>>>
> > > > > >>>>  Hello All,
> > > > > >>>>>
> > > > > >>>>> I learn hbase almost from papers and books, according to my
> > > > > >>>>> understanding, HBase is the kind of architecture which is
> more
> > > > > appliable
> > > > > >>>>> to a big cluster. We should have many HDFS nodes, and many
> > > > > HBase(region
> > > > > >>>>> server) nodes. If we only have several severs(5-8), it seems
> > > hbase
> > > > is
> > > > > >>>>> not a good choice, please correct me if I am wrong. In
> > addition,
> > > > how
> > > > > >>>>> many nodes usually we can start to consider the hbase
> solution
> > > and
> > > > > how
> > > > > >>>>> about the physic mem size and other hardware resource in each
> > > node,
> > > > > any
> > > > > >>>>> reference document or cases? Thanks.
> > > > > >>>>>
> > > > > >>>>> --Ning
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
>



-- 
Kevin O'Dell
Systems Engineer, Cloudera

Re: how many severs in a hbase cluster

Reply via email to