RE: Hbase tuning for heavy write cluster

2014-01-24 Thread Vladimir Rodionov
160 active regions? With 16G of heap and default 0.4 for memstore your cluster makes tiny flushes ~ 40MB in size - You can check RS log file. Large number of small files triggers frequent minor compactions. The smaller flush size the more times the same data will be read/written during compactio

Re: Hbase tuning for heavy write cluster

2014-01-24 Thread Bryan Beaudreault
We have 5 production HBase clusters, one of which hosts the TSDB data. 2 of these clusters (including the TSDB one) have been running 100% java7 for up to a couple weeks. We noticed immediate improvements in latency due to the G1 garbage collector which works better with higher heap sizes, and no

Re: Hbase tuning for heavy write cluster

2014-01-24 Thread Rohit Dev
Bryan, This is extremely useful information. I wanted to increase these settings but didn't know how high I could go. btw, are you running tsdb v1 on Java7 ? Thanks On Fri, Jan 24, 2014 at 6:51 PM, Bryan Beaudreault wrote: > Also, I think you can up the hbase.hstore.blockingStoreFiles quite a

Re: Hbase tuning for heavy write cluster

2014-01-24 Thread Bryan Beaudreault
Also, I think you can up the hbase.hstore.blockingStoreFiles quite a bit higher. You could try something like 50. It will reduce read performance a bit, but shouldn't be too bad especially for something like opentsdb I think. If you are going to up the blockingStoreFiles you're probably also goi

Re: Hbase tuning for heavy write cluster

2014-01-24 Thread Bryan Beaudreault
It seems from your ingestion rate you are still blowing through HFiles too fast. You're going to want to up the MEMSTORE_FLUSHSIZE for the table from the default of 128MB. If opentsdb is the only thing on this cluster, you can do the math pretty easily to find the maximum allowable, based on your

Re: Hbase tuning for heavy write cluster

2014-01-24 Thread Rohit Dev
Hi Kevin, We have about 160 regions per server with 16Gig region size and 10 drives for Hbase. I've looked at disk IO and that doesn't seem to be any problem ( % utilization is < 2 across all disk) Any suggestion what heap size I should allocation, normally I allocate 16GB. Also, I read increasi

Re: Hbase tuning for heavy write cluster

2014-01-24 Thread Kevin O'dell
Rohit, 64GB heap is not ideal, you will run into some weird issues. How many regions are you running per server, how many drives in each node, any other settings you changed from default? On Jan 24, 2014 6:22 PM, "Rohit Dev" wrote: > Hi, > > We are running Opentsdb on CDH 4.3 hbase cluster, wi

Hbase tuning for heavy write cluster

2014-01-24 Thread Rohit Dev
Hi, We are running Opentsdb on CDH 4.3 hbase cluster, with most of the default settings. The cluster is heavy on write and I'm trying to see what parameters I can tune to optimize the write performance. # I get messages related to Memstore[1] and Slow Response[2] very often, is this an indicatio

Re: Easiest way to get a random sample of keys

2014-01-24 Thread Dhaval Shah
HBase shell is a JRuby shell and wraps all Java classes in a ruby interface. You can actually use a RandomRowFilter with a 5% configuration to achieve what you need. Regards, Dhaval From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) To: user@hbase.apache.org Sen

Re: Easiest way to get a random sample of keys

2014-01-24 Thread Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
Fair enough, but I was looking for a larger sample. Say, 5% of all data in my table that has a few million rows. - Original Message - From: user@hbase.apache.org To: user@hbase.apache.org At: Jan 24 2014 18:29:23 How many regions do you have? Can you take the first key of each region as

Re: Easiest way to get a random sample of keys

2014-01-24 Thread Jean-Marc Spaggiari
How many regions do you have? Can you take the first key of each region as a sample? Le 2014-01-24 18:15, "Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)" < skada...@bloomberg.net> a écrit : > Something like count 't1', {INTERVAL=>20} should give me every 20th row in > table 't1'. Is there an easy way t

Easiest way to get a random sample of keys

2014-01-24 Thread Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
Something like count 't1', {INTERVAL=>20} should give me every 20th row in table 't1'. Is there an easy way to get a random sample via. the shell using filters?

AsyncHBase 1.5.0 has been released

2014-01-24 Thread tsuna
Hi all, after 3 months in RC2 and 2 more months past that, I'm happy to announce that AsyncHBase 1.5.0 is now officially out. AsyncHBase remains true to its initial promise: the API is still backward compatible, but under the hood it continues to work with all production releases of HBase of the pa

Re: Question regarding HBase and Hibernate

2014-01-24 Thread Ted Yu
See the following thread: http://search-hadoop.com/m/MLIjS0K6AM/orm+hbase&subj=Re+hbase+orm On Fri, Jan 24, 2014 at 11:36 AM, Patricia Bechtol < patricia.bech...@humedica.com> wrote: > Hello, > Is there any apache or other project for a hibernate implementation > for HBase? > Thank You > Pat

Question regarding HBase and Hibernate

2014-01-24 Thread Patricia Bechtol
Hello, Is there any apache or other project for a hibernate implementation for HBase? Thank You Patricia Bechtol

Re: HBase Design : Column name v/s Version

2014-01-24 Thread Sagar Naik
I do not have to purge the data. I always need all the versions. But Dhaval, raised a valid point of 100K versions and no pagination support based on versions. -Sagar On 1/24/14 11:23 AM, "Vladimir Rodionov" wrote: >One downside of using synthetic versions is you won't be able to use TTL, >whi

RE: HBase Design : Column name v/s Version

2014-01-24 Thread Vladimir Rodionov
One downside of using synthetic versions is you won't be able to use TTL, which gives you automatic purge of stale data for free Have you thought already how to purge old data? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.co

Re: HBase Design : Column name v/s Version

2014-01-24 Thread Dhaval Shah
Theoretically that could work. However, it does seem like a weird way of doing what you want to do and you might run into unforeseen issues. One issue I see is that 100k versions sounds a bit scary. You can paginate through columns but not through versions on the same column for example.   Regar

Re: HBase Design : Column name v/s Version

2014-01-24 Thread Sagar Naik
On a related note, (https://issues.apache.org/jira/browse/HBASE-4102) It says fixed in .94.0. I am on 0.94.8 Can I use Append operation for this ? -Sagar On 1/24/14 10:46 AM, "Sagar Naik" wrote: >Thanks for clarifying, > >I will be using custom version numbers (auto incrementing on the clie

Re: HBase Design : Column name v/s Version

2014-01-24 Thread Sagar Naik
Thanks for clarifying, I will be using custom version numbers (auto incrementing on the client side) and not timestamps. Two clients do not update the same row -Sagar On 1/24/14 10:33 AM, "Dhaval Shah" wrote: >I am talking about schema 2. Schema 1 would definitely work. Schema 2 can >have the

Re: HBase Design : Column name v/s Version

2014-01-24 Thread Dhaval Shah
I am talking about schema 2. Schema 1 would definitely work. Schema 2 can have the version collisions if you decide to use timestamps as versions   Regards, Dhaval - Original Message - From: Sagar Naik To: "user@hbase.apache.org" ; Dhaval Shah Cc: Sent: Friday, 24 January 2014 1:07

Re: HBase ExportSnapshot fails with ClassNotFoundException: org.apache.hadoop.hbase.TableName

2014-01-24 Thread bob
Yes, I used this command except I changed the port to 8022. As far as I can tell nothing is listening on 8082 (I have CDH 5 installed with HBase, HDFS, Yarn and Zookeeper, all up and running in default configuration). The last lines I see are: 14/01/24 13:27:50 INFO mapreduce.Job: Job job_13905812

Re: HBase Design : Column name v/s Version

2014-01-24 Thread Sagar Naik
I am not sure I understand you correctly. I assume you are talking abt schema 1. In this case I m appending the version number to the column name. The column_names are different (data_1/data_2) for value_1 and value_2 respectively. -Sagar On 1/24/14 9:47 AM, "Dhaval Shah" wrote: >Versions in

Re: HBase ExportSnapshot fails with ClassNotFoundException: org.apache.hadoop.hbase.TableName

2014-01-24 Thread Ted Yu
Did you use the command similar to the one given in http://hbase.apache.org/book.html#ops.snapshots.export ? BTW you can find out which process uses given port with the following command: netstat -tulpn | grep Cheers On Fri, Jan 24, 2014 at 9:29 AM, bob wrote: > Hello, > > Has anyone experie

HBase ExportSnapshot fails with ClassNotFoundException: org.apache.hadoop.hbase.TableName

2014-01-24 Thread bob
Hello, Has anyone experienced this issue? HBase ExportSnapshot fails with ClassNotFoundException: org.apache.hadoop.hbase.TableName. I am using CDH 5 Beta 1. And by the way what is port 8082, given in the example? I couldn't find any service using this port Thanks Vadim -- View this message in

Re: HBase Design : Column name v/s Version

2014-01-24 Thread Dhaval Shah
Versions in HBase are timestamps by default. If you intend to continue using the timestamps, what will happen when someone writes value_1 and value_2 at the exact same time?   Regards, Dhaval - Original Message - From: Sagar Naik To: "user@hbase.apache.org" Cc: Sent: Friday, 24 Janu

Re: HBase Design : Column name v/s Version

2014-01-24 Thread Ted Yu
Please see http://hbase.apache.org/book.html#schema.versions On Fri, Jan 24, 2014 at 9:27 AM, Sagar Naik wrote: > Hi, > > I have a choice to maintain to data either in column values or as > versioned data. > This data is not a versioned copy per se. > > The access pattern on this get all the da

HBase Design : Column name v/s Version

2014-01-24 Thread Sagar Naik
Hi, I have a choice to maintain to data either in column values or as versioned data. This data is not a versioned copy per se. The access pattern on this get all the data every time So the schema choices are : Schema 1: 1. column_name/qualifier => data_1. column_value => value_1 1.a. column_nam

Re: HBase MapReduce problem

2014-01-24 Thread daidong
Thanks Ted, I actually tried to modify HBase, so i choose this developer release. So, you are thinking this is a version problem, should disappear if i switched to 0.96? 2014/1/24 Ted Yu > Why do you use 0.95 which was a developer release ? > > See http://hbase.apache.org/book.html#d243e520 >

Re: HBase MapReduce problem

2014-01-24 Thread Ted Yu
Why do you use 0.95 which was a developer release ? See http://hbase.apache.org/book.html#d243e520 Cheers On Fri, Jan 24, 2014 at 8:40 AM, daidong wrote: > Dear all, > > I have a simple HBase MapReduce application and try to run it on a > 12-node cluster using this command: > > HADOOP_CLASS

HBase MapReduce problem

2014-01-24 Thread daidong
Dear all, I have a simple HBase MapReduce application and try to run it on a 12-node cluster using this command: HADOOP_CLASSPATH=`bin/hbase classpath` ~/hadoop-1.1.2/bin/hadoop jar .jar org.test.WordCount HBase version is 0.95.0. But i got this error: java.lang.RuntimeException: org.apac