Hi,
Thanks ted. we are using default split policy and our flush size is 64 MB.
And the size is calculated based on the formula
Math.min(getDesiredMaxFileSize(),initialSize * tableRegionsCount *
tableRegionsCount * tableRegionsCount);
If this size exceeds max region size (10 GB), then max
Split policy may play a role here.
Please take a look at:
http://hbase.apache.org/book.html#_custom_split_policies
On Mon, May 15, 2017 at 1:48 AM, Rajeshkumar J
wrote:
> Hi,
>
> As we run mapreduce over hbase it will take each region as input for each
> mapper.
Hi,
As we run mapreduce over hbase it will take each region as input for each
mapper. I have given region max size as 10GB. If i have about 5 gb will it
take 5 gb of data as input of mappers??
Thanks
initial.size in hbase conf,
> we haven't changed that value.
>
> so that means intial regionsize should be 2 GB, but the region size is
> 2.5TB
> i can manually split the regions, but trying to figure out the root cause.
> any other conf properties causing this behavior?
>
>
Hi Ted,
thanks for the reply,
i couldn't find the hbase.increasing.policy.initial.size in hbase conf,
we haven't changed that value.
so that means intial regionsize should be 2 GB, but the region size is 2.5TB
i can manually split the regions, but trying to figure out the root cause.
any other
nd, what configuration parameter caused this issue.
>
> i was going through this article
> http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
>
> Region split policy in our HBase is
> org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSp
> lit
is
org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy
according to Region Split policy, Region Server should create regions when
the region size limit is exceeded.
can some one explain me the root cause.
Thanks,
Yeshwanth
regions.
I was thinking about merging regions, because of overhead for managing
them(metadata, memstore per region, more flushes, more compactions).
Any suggestions? What is the avg region size in your case?
Thanks.
ata is not skewed, you never ever have hot spots.
If it's skewed, you would start to solve completely different problems :)
2015-09-07 14:59 GMT+02:00 Ted Yu <yuzhih...@gmail.com>:
> For the 96 region table, region size is too small.
>
> In production, I have seen region size as hig
For the 96 region table, region size is too small.
In production, I have seen region size as high as 50GB.
FYI
> On Sep 7, 2015, at 2:55 AM, Akmal Abbasov <akmal.abba...@icloud.com> wrote:
>
> Hi,
> I would like to know about pros and cons against small region sizes.
Hi All,
In one of my test cluster, i have set region size to 1 GB and I am using
Snappy compression.
The combined size of store files under that table is 50 GB. Then also i see
around 100 regions for that table. I am assuming that the compression ratio
is 50%. So, uncompressed data size
Rodionov
vrodio...@carrieriq.comwrote:
How small is small and how large is large?
Recommended region size is usually between 5-10GB. Too small regions
results in more frequent flushes/compactions
and have additional overhead in RS RAM.
I am thinking about extending TableInputFormat
How small is small and how large is large?
Recommended region size is usually between 5-10GB. Too small regions results in
more frequent flushes/compactions
and have additional overhead in RS RAM.
I am thinking about extending TableInputFormat to override the
1-map-per-region default policy
is the
current region count? How many MB/s are you ingesting into your cluster?
Do you write equally to all regions during ingest?
On Sun, Mar 23, 2014 at 3:51 PM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:
How small is small and how large is large?
Recommended region size is usually between 5-10GB
Hello,
We run M/Rs over several HBase tables at the same time and chose to reduce
region sizes in order to make map tasks faster and improve map-slot
turnaround between the concurrent jobs. However, I am worried many regions
will cause longer overall compactions of the HBase data. Is this the
David:
Have you looked at HBASE-3996 ' Support multiple tables and scanners as
input to the mapper in map/reduce jobs' ?
Cheers
On Sat, Mar 22, 2014 at 6:58 PM, David Koch ogd...@googlemail.com wrote:
Hello,
We run M/Rs over several HBase tables at the same time and chose to reduce
region
Hi Ted,
Thank you for your reply. I am aware of the possibility of scanning over
multiple tables in one M/R however this is not applicable in our case.
Regards,
/David
On Sun, Mar 23, 2014 at 3:10 AM, Ted Yu yuzhih...@gmail.com wrote:
David:
Have you looked at HBASE-3996 ' Support multiple
See HBASE-5140 TableInputFormat subclass to allow N number of splits per
region during MR jobs
where there was some unfinished work.
Cheers
On Sat, Mar 22, 2014 at 7:28 PM, David Koch ogd...@googlemail.com wrote:
Hi Ted,
Thank you for your reply. I am aware of the possibility of scanning
Hi All
I'm wondering is there a simple way we can decrease the number of regions
for a table? We are using HBase 0.94.6-cdh4.4.0, one of our table has more
than 100 regions, following is some information of the table:
SPLIT_POLICY =
to reduce per region size.
On Thu, Jan 16, 2014 at 12:27 PM, Ramon Wang ra...@appannie.com wrote:
Hi All
I'm wondering is there a simple way we can decrease the number of regions
for a table? We are using HBase 0.94.6-cdh4.4.0, one of our table has more
than 100 regions, following is some
increase the
hbase.hregion.max.filesize and do region merge (offline merge suggested
,hbase
org.apache.hadoop.hbase.util.Merge tbl_name region_1 region_2). You can
use compression to reduce per region size.
On Thu, Jan 16, 2014 at 12:27 PM, Ramon Wang ra...@appannie.com wrote:
Hi All
I'm
Hi
Can Anyone tell me the Java API for getting the Region Size of a table!
Thanks!
Hi Vineet,
If you want the entire table size I don't think there is any API for that.
If you want the size of the table on the disk (compressed) they you are
better to use HDFS API.
JM
2013/12/2 Vineet Mishra clearmido...@gmail.com
Hi
Can Anyone tell me the Java API for getting the Region
Are you looking to get the MAX_FILESIZE paramter? If so, there's nothing in
the client, but HBaseAdmin has what you need [1].
HTableDescriptor myDescriptor =
hbaseAdmin.getDescriptor(Bytes.toBytes(my-table));
System.out.println(my-table has a max region size
=
hbaseAdmin.getDescriptor(Bytes.toBytes(my-table));
System.out.println(my-table has a max region size of +
myDescriptor.getMaxFileSize());
1:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html
On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari
jean-m
myDescriptor =
hbaseAdmin.getDescriptor(Bytes.toBytes(my-table));
System.out.println(my-table has a max region size of +
myDescriptor.getMaxFileSize());
1:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html
On Mon, Dec 2, 2013 at 9:05 AM, Jean
in
the client, but HBaseAdmin has what you need [1].
HTableDescriptor myDescriptor =
hbaseAdmin.getDescriptor(Bytes.toBytes(my-table));
System.out.println(my-table has a max region size of +
myDescriptor.getMaxFileSize());
1:
http://hbase.apache.org/apidocs
=
hbaseAdmin.getDescriptor(Bytes.toBytes(my-table));
System.out.println(my-table has a max region size of +
myDescriptor.getMaxFileSize());
1:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html
On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc
the Region Size of a table!
Thanks!
Great that we agree to having most of the metrics on Master UI rather
than a new tool. Also we found that hannibal is very CPU hungry.
@Bryan thanks for creating the issue. I will get started on this. To
start with I will have those same metrics as we have on the regionserver
page per
Hi Devs/Users,
Most of the time we want to know if our table split logic is
accurate of if our current regions are well balanced for a table. I was
wondering if we can expose the size of region on the table.jsp too on
the table region table. If people thing it is useful I can pick it up.
Hi Samar
Hannibal is already doing what you are looking for.
Cheers,
JMS
2013/8/1 samar.opensource samar.opensou...@gmail.com
Hi Devs/Users,
Most of the time we want to know if our table split logic is accurate
of if our current regions are well balanced for a table. I was wondering if
Hi Jean,
You are right , hannibal does that, but it a seperate process we need
to install/maintail. I thought if we had a quick and easy way to see it
from master-status page. The stats are already on the regionserver
page(like total size of the store) , just that it would make sense to
Hi, Bryan. If you file an issue for that, it would be nice to work on it.
2013/8/1 Bryan Beaudreault bbeaudrea...@hubspot.com
Hannibal is very useful, but samar is right. It's another thing to install
and maintain. I'd hope that over time the need for tools like hannibal
would be lessened
Created https://issues.apache.org/jira/browse/HBASE-9113
On Thu, Aug 1, 2013 at 5:34 PM, Marcos Luis Ortiz Valmaseda
marcosluis2...@gmail.com wrote:
Hi, Bryan. If you file an issue for that, it would be nice to work on it.
2013/8/1 Bryan Beaudreault bbeaudrea...@hubspot.com
Hannibal is
I think the answer to this is no, but I am hoping someone with more
experience can confirm this… we are on hbase 0.90.4 (from cdh3u2). Some of our
storefiles have grown into the 3-4GB range (we have 100GB max region size).
Ignoring compactions, do large storefiles like this have a negative
storefiles have grown into the 3-4GB range (we have 100GB max region
size). Ignoring compactions, do large storefiles like this have a negative
impact on random reads? We only recently started doing a large number of
random gets so I have no history to go on in terms of correlating storefile
size
are on hbase 0.90.4 (from cdh3u2). Some
of our storefiles have grown into the 3-4GB range (we have 100GB max
region size). Ignoring compactions, do large storefiles like this have a
negative impact on random reads? We only recently started doing a large
number of random gets so I have no history to go
:29 PM, Paul Mackles pmack...@adobe.com wrote:
I think the answer to this is no, but I am hoping someone with more
experience can confirm thisŠ we are on hbase 0.90.4 (from cdh3u2). Some
of our storefiles have grown into the 3-4GB range (we have 100GB max
region size). Ignoring compactions, do large
On Wed, May 2, 2012 at 6:00 PM, Paul Mackles pmack...@adobe.com wrote:
Thanks for the tip Doug. Does that boost come largely from the HDFS
improvements?
Yeah, unless you install 0.92.x hbase (or if you want more
improvement, install 0.94.x RC).
St.Ack
The funny thing about tuning... What works for one situation may not work well
for others.
Using the old recommendation of never exceeding 1000 R per RS, keeping it low
around 100-200 and monitoring tables and changing the REgion Size on a table by
table basis we are doing OK.
( of course
...@hotmail.comwrote:
The funny thing about tuning... What works for one situation may not work
well for others.
Using the old recommendation of never exceeding 1000 R per RS, keeping it
low around 100-200 and monitoring tables and changing the REgion Size on a
table by table basis we are doing OK.
( of course
Do we know what would need to change in HBase in order to be able to manage
more regions per regionserver?
With 20 regions per server, one would need 300G regions to just utilize 6T of
drive space.
To utilize a regionserver/datanode with 24T drive space the region size would
be an insane 1T
per regionserver?
With 20 regions per server, one would need 300G regions to just utilize
6T of drive space.
To utilize a regionserver/datanode with 24T drive space the region size
would be an insane 1T.
-- Lars
From: Nicolas Spiegelberg nspiegelb...@fb.com
HI all,
My HBase cluster is 10 nodes, each node has 12core , 48G RAM, 24TB disk,
10GEthernet.
My region size is 1GB.
Any guidelines on how many regions can a RS handle comfortably?
I vaguely remember reading some where to have no more than 1000 regions /
server; that comes to 1TB / server
These days I think the recommendation is more like 20 regions per
region server, and the region size set accordingly. The major caveat
is that when you start compacting the bigger store files you can
really take a massive IO hit, so most of the time major compactions
are tuned to run only every
optimizations for compactions in 0.92. In our case we have a pretty
old setup and had way too many regions so we ran a few online merges
to bring this down to like 80 regions/RS and it's working pretty well.
J-D
what is the region size you use?
and is it 80 regions / table / region-server
is the region size you use?
20GB, less in some cases for small tables.
and is it 80 regions / table / region-server? or 80 regions / all tables
/ regionserver?
80 regions total / RS
J-D
On Tue, Nov 1, 2011 at 2:46 PM, Sujee Maniyam su...@sujee.net wrote:
20GB, compressed ? If so is it LZO or Snappy?
The region size is expressed in terms of size on disk, in our case it's LZOed.
J-D
...@sujee.net wrote:
HI all,
My HBase cluster is 10 nodes, each node has 12core , 48G RAM, 24TB disk,
10GEthernet.
My region size is 1GB.
Any guidelines on how many regions can a RS handle comfortably?
I vaguely remember reading some where to have no more than 1000 regions /
server; that comes to 1TB
It is possible to control the region size (hstore size) on a per table
basis? I have certain applications where the overall keyspace is small
but I'd like the data to spread nicely over many region servers that use
a certain table and another one that has potentially 2 orders of
magnitude of data
the region size (hstore size) on a per table
basis? I have certain applications where the overall keyspace is small
but I'd like the data to spread nicely over many region servers that use
a certain table and another one that has potentially 2 orders of
magnitude of data and I'd like to keep
performance also is unrelated to file/region
size (We consult the in-memory index to figure where to jump to to
start the read -- this should be the same for big or small files).
St.Ack
From: Stack st...@duboce.net
3. The size of them varies like this
70% from them have their length 1MB
29% from them have their length between 1MB and 10 MB
1% from them have their length 10MB (they can have also
100MB)
What David says above though
On 01/07/11 10:23, Andrew Purtell wrote:
From: Stackst...@duboce.net
3. The size of them varies like this
70% from them have their length 1MB
29% from them have their length between 1MB and 10 MB
1% from them have their length 10MB (they can have also
...@apache.org wrote:
From: Andrew Purtell apurt...@apache.org
Subject: Re: HBase region size
To: user@hbase.apache.org user@hbase.apache.org
Date: Friday, July 1, 2011, 4:23 AM
From: Stack st...@duboce.net
3. The size of them varies like this
70% from them have their length
One reasonable way to handle native storage of large objects in HBase would
be to introduce a layer of indirection.
Do you see this layer on the client or on the server side?
Client side.
I was also thinking on the update: Le's say we store a new version of
the large object which is
regionserver for better performance
and reducing I/O ?
Scan reads don't care about file size (bigger may actually be slightly
faster). Random read performance also is unrelated to file/region
size (We consult the in-memory index to figure where to jump to to
start the read -- this should
regionserver is carrying all load whereas if you had more
regions, the hot section could be distributed about the cluster.
I'd
rather then keep the region size to unlimited, and if the region gets hot,
manually split and move ? Any risk associated with this approach ?
Sure. You could do this (In one
Hi Andrew,
Thx for your replies.
I may give a try one day to this indirection layer if someone does not
pick it before me :)
On 01/07/11 18:34, Andrew Purtell wrote:
One reasonable way to handle native storage of large objects in HBase would
be to introduce a layer of indirection.
Do you
contributing
to the take the right decision for using the right tool.
Thank you,
Florin
--- On Tue, 6/28/11, Buttler, David buttl...@llnl.gov wrote:
From: Buttler, David buttl...@llnl.gov
Subject: RE: HBase region size
To: user@hbase.apache.org user@hbase.apache.org
Date: Tuesday, June 28
size would be 256 MB.
3. Each datanode to have atleast 32 TB (Tera Bytes) of disk space. (We may
add more data nodes to accomodate 1PB)
The question here is, if we have a region size of 256MB, will we still
have a problem of too many small files in the Hadoop for the number of
regions it may
region size would be 256 MB.
3. Each datanode to have atleast 32 TB (Tera Bytes) of disk space. (We may add
more data nodes to accomodate 1PB)
The question here is, if we have a region size of 256MB, will we still have a
problem of too many small files in the Hadoop for the number of regions
size. If you only have only CF/column holding
a single 5MB object, then you should be fine.
* the larger your region size, the less overhead there is for storage, and the
fewer total regions you will need. The drawback is that random access will be
slower. Given your object size
64 matches
Mail list logo