Looking forward to the blog!

Thanks,
Chalcy

-----Original Message-----
From: lars hofhansl [mailto:la...@apache.org] 
Sent: Thursday, January 17, 2013 9:24 PM
To: user@hbase.apache.org
Subject: Re: Hbase heap size

You'll  need more memory then, or more machines with not much disk attached.

You can look at it this way:
- The largest useful region size is 20G (at least that is the current common 
tribal knowledge).
- Each region has at least one memstore (one per column family actually, let's 
just say one for the sake of argument).

If you have 10T disks per region server then you need ~170 regions per region 
server (3*20G*170 ~ 10T).
If you give the memstore 35% of your heap and have 128M memstores you would 
need 170*128M/0.35 G ~ 60G of heap. That's already too large.
If you make the memstores 600M, you'll need 17*600/0.35 G ~ 290G of heap (if 
all memstores are being written to simultaneously).

There are ways to address that.
If you expect that not all memstores are written to at the same time, you can 
leave them smaller and increase their size multipliers, which allows them to be 
temporarily larger.

Again, this is just back of the envelope.

This is a lengthy topic, I'm planning a blog post around this. There are a 
bunch or parameters that can be tweaked based on workload.

The main take away for HBase is that you have to match disk space with Java 
heap.

-- Lars



________________________________
 From: Varun Sharma <va...@pinterest.com>
To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
Sent: Thursday, January 17, 2013 3:24 PM
Subject: Re: Hbase heap size
 
Thanks for the info. I am looking for a balance where I have a write heavy work 
load and need excellent read latency. So 40 % to block cache for caching, 35 % 
to memstore.

But I would like to also reduce the number of HFiles and amount of compaction 
activity. So, having few number of regions and much larger memstore flush size 
- like 640M. Could a large memstore flush be a problem in some sense ? Are 
updates blocked on memstore flush ? In my case, I would expect a 600M sized 
memstore to materialize into a 200-300M sized HFile.

On Thu, Jan 17, 2013 at 2:31 PM, lars hofhansl <la...@apache.org> wrote:

> A good rule of thumb that I found is to give each region server a Java 
> help that is roughly 1/100th of the size of the disk space per region 
> server.
> (that is assuming all the default setting: 10G regions, 128M 
> memstores, 40% of heap for memstores, 20% of heap for block cache, 
> 3-way replication)
>
>
> That is, if you give the region server a 10G heap, you can expect to 
> be able to serve about 1T worth of disk space.
>
> That can be tweaked of course (increase the region size to 20G, if 
> your load is mostly readonly you shrink the memstores, etc).
> That way you can reduce that ratio to 1/200 or even less.
>
>
> I'm sure other folks will have more detailed input.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Varun Sharma <va...@pinterest.com>
> To: user@hbase.apache.org
> Sent: Thursday, January 17, 2013 1:15 PM
> Subject: Hbase heap size
>
> Hi,
>
> I was wondering how much folks typical give to hbase and how much they 
> leave for the file system cache for the region server. I am using 
> hbase
> 0.94 and running only the region server and data node daemons. I have 
> a system with 15G ram.
>
> Thanks
>

Reply via email to