Re: region size for a mapper

2017-05-15 Thread Rajeshkumar J
Hi, Thanks ted. we are using default split policy and our flush size is 64 MB. And the size is calculated based on the formula Math.min(getDesiredMaxFileSize(),initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount); If this size exceeds max region size (10 GB), then max

Re: region size for a mapper

2017-05-15 Thread Ted Yu
Split policy may play a role here. Please take a look at: http://hbase.apache.org/book.html#_custom_split_policies On Mon, May 15, 2017 at 1:48 AM, Rajeshkumar J wrote: > Hi, > > As we run mapreduce over hbase it will take each region as input for each > mapper.

region size for a mapper

2017-05-15 Thread Rajeshkumar J
Hi, As we run mapreduce over hbase it will take each region as input for each mapper. I have given region max size as 10GB. If i have about 5 gb will it take 5 gb of data as input of mappers?? Thanks

Re: HBase Region Size of 2.5 TB

2016-08-28 Thread Ted Yu
initial.size in hbase conf, > we haven't changed that value. > > so that means intial regionsize should be 2 GB, but the region size is > 2.5TB > i can manually split the regions, but trying to figure out the root cause. > any other conf properties causing this behavior? > >

Re: HBase Region Size of 2.5 TB

2016-08-28 Thread yeshwanth kumar
Hi Ted, thanks for the reply, i couldn't find the hbase.increasing.policy.initial.size in hbase conf, we haven't changed that value. so that means intial regionsize should be 2 GB, but the region size is 2.5TB i can manually split the regions, but trying to figure out the root cause. any other

Re: HBase Region Size of 2.5 TB

2016-08-26 Thread Ted Yu
nd, what configuration parameter caused this issue. > > i was going through this article > http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/ > > Region split policy in our HBase is > org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSp > lit

HBase Region Size of 2.5 TB

2016-08-26 Thread yeshwanth kumar
is org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy according to Region Split policy, Region Server should create regions when the region size limit is exceeded. can some one explain me the root cause. Thanks, Yeshwanth

HBase table region size

2015-09-07 Thread Akmal Abbasov
regions. I was thinking about merging regions, because of overhead for managing them(metadata, memstore per region, more flushes, more compactions). Any suggestions? What is the avg region size in your case? Thanks.

Re: HBase table region size

2015-09-07 Thread Serega Sheypak
ata is not skewed, you never ever have hot spots. If it's skewed, you would start to solve completely different problems :) 2015-09-07 14:59 GMT+02:00 Ted Yu <yuzhih...@gmail.com>: > For the 96 region table, region size is too small. > > In production, I have seen region size as hig

Re: HBase table region size

2015-09-07 Thread Ted Yu
For the 96 region table, region size is too small. In production, I have seen region size as high as 50GB. FYI > On Sep 7, 2015, at 2:55 AM, Akmal Abbasov <akmal.abba...@icloud.com> wrote: > > Hi, > I would like to know about pros and cons against small region sizes.

Region Size == Size of Compressed Store file or Actual Size of Data in Store?

2014-05-15 Thread anil gupta
Hi All, In one of my test cluster, i have set region size to 1 GB and I am using Snappy compression. The combined size of store files under that table is 50 GB. Then also i see around 100 regions for that table. I am assuming that the compression ratio is 50%. So, uncompressed data size

Re: Effect of region size on compaction performance

2014-03-26 Thread David Koch
Rodionov vrodio...@carrieriq.comwrote: How small is small and how large is large? Recommended region size is usually between 5-10GB. Too small regions results in more frequent flushes/compactions and have additional overhead in RS RAM. I am thinking about extending TableInputFormat

RE: Effect of region size on compaction performance

2014-03-23 Thread Vladimir Rodionov
How small is small and how large is large? Recommended region size is usually between 5-10GB. Too small regions results in more frequent flushes/compactions and have additional overhead in RS RAM. I am thinking about extending TableInputFormat to override the 1-map-per-region default policy

Re: Effect of region size on compaction performance

2014-03-23 Thread Kevin O'dell
is the current region count? How many MB/s are you ingesting into your cluster? Do you write equally to all regions during ingest? On Sun, Mar 23, 2014 at 3:51 PM, Vladimir Rodionov vrodio...@carrieriq.comwrote: How small is small and how large is large? Recommended region size is usually between 5-10GB

Effect of region size on compaction performance

2014-03-22 Thread David Koch
Hello, We run M/Rs over several HBase tables at the same time and chose to reduce region sizes in order to make map tasks faster and improve map-slot turnaround between the concurrent jobs. However, I am worried many regions will cause longer overall compactions of the HBase data. Is this the

Re: Effect of region size on compaction performance

2014-03-22 Thread Ted Yu
David: Have you looked at HBASE-3996 ' Support multiple tables and scanners as input to the mapper in map/reduce jobs' ? Cheers On Sat, Mar 22, 2014 at 6:58 PM, David Koch ogd...@googlemail.com wrote: Hello, We run M/Rs over several HBase tables at the same time and chose to reduce region

Re: Effect of region size on compaction performance

2014-03-22 Thread David Koch
Hi Ted, Thank you for your reply. I am aware of the possibility of scanning over multiple tables in one M/R however this is not applicable in our case. Regards, /David On Sun, Mar 23, 2014 at 3:10 AM, Ted Yu yuzhih...@gmail.com wrote: David: Have you looked at HBASE-3996 ' Support multiple

Re: Effect of region size on compaction performance

2014-03-22 Thread Ted Yu
See HBASE-5140 TableInputFormat subclass to allow N number of splits per region during MR jobs where there was some unfinished work. Cheers On Sat, Mar 22, 2014 at 7:28 PM, David Koch ogd...@googlemail.com wrote: Hi Ted, Thank you for your reply. I am aware of the possibility of scanning

How to shrink HBase region size?

2014-01-15 Thread Ramon Wang
Hi All I'm wondering is there a simple way we can decrease the number of regions for a table? We are using HBase 0.94.6-cdh4.4.0, one of our table has more than 100 regions, following is some information of the table: SPLIT_POLICY =

Re: How to shrink HBase region size?

2014-01-15 Thread Bharath Vissapragada
to reduce per region size. On Thu, Jan 16, 2014 at 12:27 PM, Ramon Wang ra...@appannie.com wrote: Hi All I'm wondering is there a simple way we can decrease the number of regions for a table? We are using HBase 0.94.6-cdh4.4.0, one of our table has more than 100 regions, following is some

Re: How to shrink HBase region size?

2014-01-15 Thread Ramon Wang
increase the hbase.hregion.max.filesize and do region merge (offline merge suggested ,hbase org.apache.hadoop.hbase.util.Merge tbl_name region_1 region_2). You can use compression to reduce per region size. On Thu, Jan 16, 2014 at 12:27 PM, Ramon Wang ra...@appannie.com wrote: Hi All I'm

Hbase Region Size

2013-12-02 Thread Vineet Mishra
Hi Can Anyone tell me the Java API for getting the Region Size of a table! Thanks!

Re: Hbase Region Size

2013-12-02 Thread Jean-Marc Spaggiari
Hi Vineet, If you want the entire table size I don't think there is any API for that. If you want the size of the table on the disk (compressed) they you are better to use HDFS API. JM 2013/12/2 Vineet Mishra clearmido...@gmail.com Hi Can Anyone tell me the Java API for getting the Region

Re: Hbase Region Size

2013-12-02 Thread Mike Axiak
Are you looking to get the MAX_FILESIZE paramter? If so, there's nothing in the client, but HBaseAdmin has what you need [1]. HTableDescriptor myDescriptor = hbaseAdmin.getDescriptor(Bytes.toBytes(my-table)); System.out.println(my-table has a max region size

Re: Hbase Region Size

2013-12-02 Thread Jean-Marc Spaggiari
= hbaseAdmin.getDescriptor(Bytes.toBytes(my-table)); System.out.println(my-table has a max region size of + myDescriptor.getMaxFileSize()); 1: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc Spaggiari jean-m

Re: Hbase Region Size

2013-12-02 Thread Vineet Mishra
myDescriptor = hbaseAdmin.getDescriptor(Bytes.toBytes(my-table)); System.out.println(my-table has a max region size of + myDescriptor.getMaxFileSize()); 1: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html On Mon, Dec 2, 2013 at 9:05 AM, Jean

Re: Hbase Region Size

2013-12-02 Thread Jean-Marc Spaggiari
in the client, but HBaseAdmin has what you need [1]. HTableDescriptor myDescriptor = hbaseAdmin.getDescriptor(Bytes.toBytes(my-table)); System.out.println(my-table has a max region size of + myDescriptor.getMaxFileSize()); 1: http://hbase.apache.org/apidocs

Re: Hbase Region Size

2013-12-02 Thread Jean-Marc Spaggiari
= hbaseAdmin.getDescriptor(Bytes.toBytes(my-table)); System.out.println(my-table has a max region size of + myDescriptor.getMaxFileSize()); 1: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html On Mon, Dec 2, 2013 at 9:05 AM, Jean-Marc

Re: Hbase Region Size

2013-12-02 Thread Asaf Mesika
the Region Size of a table! Thanks!

Re: Region size per region on the table page

2013-08-02 Thread samar.opensource
Great that we agree to having most of the metrics on Master UI rather than a new tool. Also we found that hannibal is very CPU hungry. @Bryan thanks for creating the issue. I will get started on this. To start with I will have those same metrics as we have on the regionserver page per

Region size per region on the table page

2013-08-01 Thread samar.opensource
Hi Devs/Users, Most of the time we want to know if our table split logic is accurate of if our current regions are well balanced for a table. I was wondering if we can expose the size of region on the table.jsp too on the table region table. If people thing it is useful I can pick it up.

Re: Region size per region on the table page

2013-08-01 Thread Jean-Marc Spaggiari
Hi Samar Hannibal is already doing what you are looking for. Cheers, JMS 2013/8/1 samar.opensource samar.opensou...@gmail.com Hi Devs/Users, Most of the time we want to know if our table split logic is accurate of if our current regions are well balanced for a table. I was wondering if

Re: Region size per region on the table page

2013-08-01 Thread samar.opensource
Hi Jean, You are right , hannibal does that, but it a seperate process we need to install/maintail. I thought if we had a quick and easy way to see it from master-status page. The stats are already on the regionserver page(like total size of the store) , just that it would make sense to

Re: Region size per region on the table page

2013-08-01 Thread Marcos Luis Ortiz Valmaseda
Hi, Bryan. If you file an issue for that, it would be nice to work on it. 2013/8/1 Bryan Beaudreault bbeaudrea...@hubspot.com Hannibal is very useful, but samar is right. It's another thing to install and maintain. I'd hope that over time the need for tools like hannibal would be lessened

Re: Region size per region on the table page

2013-08-01 Thread Bryan Beaudreault
Created https://issues.apache.org/jira/browse/HBASE-9113 On Thu, Aug 1, 2013 at 5:34 PM, Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com wrote: Hi, Bryan. If you file an issue for that, it would be nice to work on it. 2013/8/1 Bryan Beaudreault bbeaudrea...@hubspot.com Hannibal is

region size

2012-05-02 Thread Paul Mackles
I think the answer to this is no, but I am hoping someone with more experience can confirm this… we are on hbase 0.90.4 (from cdh3u2). Some of our storefiles have grown into the 3-4GB range (we have 100GB max region size). Ignoring compactions, do large storefiles like this have a negative

Re: region size

2012-05-02 Thread Jean-Daniel Cryans
storefiles have grown into the 3-4GB range (we have 100GB max region size). Ignoring compactions, do large storefiles like this have a negative impact on random reads? We only recently started doing a large number of random gets so I have no history to go on in terms of correlating storefile size

Re: region size

2012-05-02 Thread Doug Meil
are on hbase 0.90.4 (from cdh3u2). Some of our storefiles have grown into the 3-4GB range (we have 100GB max region size). Ignoring compactions, do large storefiles like this have a negative impact on random reads? We only recently started doing a large number of random gets so I have no history to go

Re: region size

2012-05-02 Thread Paul Mackles
:29 PM, Paul Mackles pmack...@adobe.com wrote: I think the answer to this is no, but I am hoping someone with more experience can confirm thisŠ we are on hbase 0.90.4 (from cdh3u2). Some of our storefiles have grown into the 3-4GB range (we have 100GB max region size). Ignoring compactions, do large

Re: region size

2012-05-02 Thread Stack
On Wed, May 2, 2012 at 6:00 PM, Paul Mackles pmack...@adobe.com wrote: Thanks for the tip Doug. Does that boost come largely from the HDFS improvements? Yeah, unless you install 0.92.x hbase (or if you want more improvement, install 0.94.x RC). St.Ack

Re: region size/count per regionserver

2011-11-04 Thread Michel Segel
The funny thing about tuning... What works for one situation may not work well for others. Using the old recommendation of never exceeding 1000 R per RS, keeping it low around 100-200 and monitoring tables and changing the REgion Size on a table by table basis we are doing OK. ( of course

Re: region size/count per regionserver

2011-11-04 Thread Mikael Sitruk
...@hotmail.comwrote: The funny thing about tuning... What works for one situation may not work well for others. Using the old recommendation of never exceeding 1000 R per RS, keeping it low around 100-200 and monitoring tables and changing the REgion Size on a table by table basis we are doing OK. ( of course

Re: region size/count per regionserver

2011-11-02 Thread lars hofhansl
Do we know what would need to change in HBase in order to be able to manage more regions per regionserver? With 20 regions per server, one would need 300G regions to just utilize 6T of drive space. To utilize a regionserver/datanode with 24T drive space the region size would be an insane 1T

Re: region size/count per regionserver

2011-11-02 Thread Nicolas Spiegelberg
per regionserver? With 20 regions per server, one would need 300G regions to just utilize 6T of drive space. To utilize a regionserver/datanode with 24T drive space the region size would be an insane 1T. -- Lars From: Nicolas Spiegelberg nspiegelb...@fb.com

region size/count per regionserver

2011-11-01 Thread Sujee Maniyam
HI all, My HBase cluster is 10 nodes, each node has 12core , 48G RAM, 24TB disk, 10GEthernet. My region size is 1GB. Any guidelines on how many regions can a RS handle comfortably? I vaguely remember reading some where to have no more than 1000 regions / server; that comes to 1TB / server

Re: region size/count per regionserver

2011-11-01 Thread Jean-Daniel Cryans
These days I think the recommendation is more like 20 regions per region server, and the region size set accordingly. The major caveat is that when you start compacting the bigger store files you can really take a massive IO hit, so most of the time major compactions are tuned to run only every

Re: region size/count per regionserver

2011-11-01 Thread Sujee Maniyam
optimizations for compactions in 0.92. In our case we have a pretty old setup and had way too many regions so we ran a few online merges to bring this down to like 80 regions/RS and it's working pretty well. J-D what is the region size you use? and is it 80 regions / table / region-server

Re: region size/count per regionserver

2011-11-01 Thread Jean-Daniel Cryans
is the region size you use? 20GB, less in some cases for small tables. and is it 80 regions / table / region-server?   or 80 regions / all tables / regionserver? 80 regions total / RS J-D

Re: region size/count per regionserver

2011-11-01 Thread Jean-Daniel Cryans
On Tue, Nov 1, 2011 at 2:46 PM, Sujee Maniyam su...@sujee.net wrote: 20GB, compressed ?  If so is it LZO or Snappy? The region size is expressed in terms of size on disk, in our case it's LZOed. J-D

Re: region size/count per regionserver

2011-11-01 Thread Nicolas Spiegelberg
...@sujee.net wrote: HI all, My HBase cluster is 10 nodes, each node has 12core , 48G RAM, 24TB disk, 10GEthernet. My region size is 1GB. Any guidelines on how many regions can a RS handle comfortably? I vaguely remember reading some where to have no more than 1000 regions / server; that comes to 1TB

per table region size

2011-08-08 Thread Arvind Jayaprakash
It is possible to control the region size (hstore size) on a per table basis? I have certain applications where the overall keyspace is small but I'd like the data to spread nicely over many region servers that use a certain table and another one that has potentially 2 orders of magnitude of data

Re: per table region size

2011-08-08 Thread Ravi Veeramachaneni
the region size (hstore size) on a per table basis? I have certain applications where the overall keyspace is small but I'd like the data to spread nicely over many region servers that use a certain table and another one that has potentially 2 orders of magnitude of data and I'd like to keep

Re: HBase region size

2011-07-01 Thread Stack
performance also is unrelated to file/region size (We consult the in-memory index to figure where to jump to to start the read -- this should be the same for big or small files). St.Ack

Re: HBase region size

2011-07-01 Thread Andrew Purtell
From: Stack st...@duboce.net  3. The size of them varies like this            70% from them have their length 1MB            29% from them have their length between 1MB and 10 MB            1% from them have their length 10MB (they can have also  100MB)   What David says above though

Re: HBase region size

2011-07-01 Thread Eric Charles
On 01/07/11 10:23, Andrew Purtell wrote: From: Stackst...@duboce.net 3. The size of them varies like this 70% from them have their length 1MB 29% from them have their length between 1MB and 10 MB 1% from them have their length 10MB (they can have also

Re: HBase region size

2011-07-01 Thread Florin P
...@apache.org wrote: From: Andrew Purtell apurt...@apache.org Subject: Re: HBase region size To: user@hbase.apache.org user@hbase.apache.org Date: Friday, July 1, 2011, 4:23 AM From: Stack st...@duboce.net  3. The size of them varies like this            70% from them have their length

Re: HBase region size

2011-07-01 Thread Andrew Purtell
One reasonable way to handle native storage of large objects in HBase would  be to introduce a layer of indirection.   Do you see this layer on the client or on the server side? Client side. I was also thinking on the update: Le's say we store a new version of  the large object which is

Re: HBase region size

2011-07-01 Thread Sam Seigal
regionserver for better performance and reducing I/O ? Scan reads don't care about file size (bigger may actually be slightly faster). Random read performance also is unrelated to file/region size (We consult the in-memory index to figure where to jump to to start the read -- this should

Re: HBase region size

2011-07-01 Thread Stack
regionserver is carrying all load whereas if you had more regions, the hot section could be distributed about the cluster. I'd rather then keep the region size to unlimited, and if the region gets hot, manually split and move ? Any risk associated with this approach ? Sure. You could do this (In one

Re: HBase region size

2011-07-01 Thread Eric Charles
Hi Andrew, Thx for your replies. I may give a try one day to this indirection layer if someone does not pick it before me :) On 01/07/11 18:34, Andrew Purtell wrote: One reasonable way to handle native storage of large objects in HBase would be to introduce a layer of indirection. Do you

RE: HBase region size

2011-06-29 Thread Florin P
contributing to the take the right decision for using the right tool. Thank you, Florin --- On Tue, 6/28/11, Buttler, David buttl...@llnl.gov wrote: From: Buttler, David buttl...@llnl.gov Subject: RE: HBase region size To: user@hbase.apache.org user@hbase.apache.org Date: Tuesday, June 28

HBase region size

2011-06-28 Thread Aditya Karanth A
size would be 256 MB. 3. Each datanode to have atleast 32 TB (Tera Bytes) of disk space. (We may add more data nodes to accomodate 1PB) The question here is, if we have a region size of 256MB, will we still have a problem of too many small files in the Hadoop for the number of regions it may

HBase region size config

2011-06-28 Thread Aditya Karanth A
region size would be 256 MB. 3. Each datanode to have atleast 32 TB (Tera Bytes) of disk space. (We may add more data nodes to accomodate 1PB) The question here is, if we have a region size of 256MB, will we still have a problem of too many small files in the Hadoop for the number of regions

RE: HBase region size

2011-06-28 Thread Buttler, David
size. If you only have only CF/column holding a single 5MB object, then you should be fine. * the larger your region size, the less overhead there is for storage, and the fewer total regions you will need. The drawback is that random access will be slower. Given your object size