Re: Pack rows into a wide row for better performance?

2013-08-28 Thread Chris Perluss
I'm still kinda new to HBase so please excuse me if I am wrong. I suspect the reason has to do with a different slide from their presentation where they run a job every hour to combine all the cells from the previous hour into one cell. OpenTSDB has quite a long row key. It contains the metric

Re: Pack rows into a wide row for better performance?

2013-08-28 Thread Chris Perluss
Sorry, accidentally hit send. I'm guessing a 10 minute time slice would drop their space savings from 4-8x down to 2-4x. On Aug 27, 2013 11:30 PM, Chris Perluss tradersan...@gmail.com wrote: I'm still kinda new to HBase so please excuse me if I am wrong. I suspect the reason has to do with a

Re: Data Deduplication in HBase

2013-08-28 Thread Chris Perluss
It might help to pick a granularity level. For example let's suppose you pick a granularity level of 0.1. Any piece of the song you receive should be broken down into segments of 0.1 and they need to be aligned on 0.1. Example: you receive a piece of the song from 0.65 to 0.85. You would break

Re: Hbase 0.94.6 stargate can't use multi get

2013-08-28 Thread Dmitriy Troyan
Hey Ravi, Seems I find what problem was: when I communicate with stargate I not set Accept header to application/json. It was octet-stream and according to documentation it can only give one value. Thanks. On Wed, Aug 28, 2013 at 8:46 AM, Dmitriy Troyan troyan.dmit...@gmail.comwrote: Please

Re: Data Deduplication in HBase

2013-08-28 Thread Anand Nalya
Hi Chris, Thanks a lot for the detailed response. I'll definitely try this design and see how it performs. Anand On 28 August 2013 13:56, Chris Perluss tradersan...@gmail.com wrote: It might help to pick a granularity level. For example let's suppose you pick a granularity level of 0.1.

Is it possible to get the region count of a table via an API?

2013-08-28 Thread Pavan Sudheendra
Hi all, I know what we can go over to the HBase UI and make a split on our table so that it will be distributed over the cluster.. Is there a way to know it via an API and to possibly change it? This is to know how many map tasks run on our table before we actually run the MR job.. -- Regards-

Re: Is it possible to get the region count of a table via an API?

2013-08-28 Thread Ashwanth Kumar
To check how regions you have in a table (and possibly what they are) HBaseAdmin#getTableRegionshttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#getTableRegions(byte[]). In order to split the table you can use

experiencing high latency for few reads in HBase

2013-08-28 Thread Saurabh Yahoo
Hi, We are running a stress test in our 5 node cluster and we are getting the expected mean latency of 10ms. But we are seeing around 20 reads out of 25 million reads having latency more than 4 seconds. Can anyone provide the insight what we can do to meet below second SLA for each and every

Re: Is it possible to get the region count of a table via an API?

2013-08-28 Thread Pavan Sudheendra
How do i get the Server Name associated with the region? On Wed, Aug 28, 2013 at 3:46 PM, Ashwanth Kumar ashwanthku...@googlemail.com wrote: To check how regions you have in a table (and possibly what they are) HBaseAdmin#getTableRegions

Re: Is it possible to get the region count of a table via an API?

2013-08-28 Thread Surendra , Manchikanti
Listhttp://docs.oracle.com/javase/6/docs/api/java/util/List.html?is-external=true HRegionInfohttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html = HBaseAdmin#getTableRegions(tablename); HRegionInfo.getServerName(); Regards, Surendra M -- Surendra Manchikanti On Wed, Aug

Re: Re: Will hbase automatically distribute the data across region servers or NOT..??

2013-08-28 Thread Frank Chow
Hi, According the error message in the master log, there maybe some inconsistencies in the configuration, check the configuration on all nodes, if the properity below is configured, and if it's inconsistent. property namehbase.metrics.showTableName/name valuetrue/value

Re: how to export data from hbase to mysql?

2013-08-28 Thread Shahab Yunus
Taking what Ravi Kiran mentioned a level higher, you can also use Pig. It has DBStorage. Very easy to rad from HBase and dump to MySQL if your data porting does not require complex transformation (even which can be handled in Pig too.)

Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
HI All, We have a very heavy map reduce job that goes over entire table with over 1TB+ data in HBase and exports all data (Similar to Export job but with some additional custom code built in) to HDFS. However this job is not very stable, and often times we get following error and job fails:

How to enable metrics2?

2013-08-28 Thread Ionut Ignatescu
Hi, I am using HBase 0.94.11 with Hadoop 1.1.2 I want to improve my current monitoring solution and I create a custom MetricsSink that export metrics in a custom format. This solution runs perfect with Hadoop. Unfortunately, I cannot say the same thing about HBase. I have several questions: 1.

Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Dhaval Shah
Couple of things: - Can you check the resources on the region server for which you get the lease exception? It seems like the server is heavily thrashed - What are your values for scan.setCaching and scan.setBatch?  The lease does not exist exception generally happens when the client goes back

Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ted Yu
From the log you posted on pastebin, I see the following. Can you check namenode log to see what went wrong ? 1. Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on

Re: Is it possible to get the region count of a table via an API?

2013-08-28 Thread Ted Yu
You can also use HTable#getRegionLocations(): public NavigableMapHRegionInfo, ServerName getRegionLocations() throwsIOException { FYI On Wed, Aug 28, 2013 at 6:12 AM, Surendra , Manchikanti surendra.manchika...@gmail.com wrote: List

Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
Thanks for your response. I checked namenode logs and I find following: 2013-08-28 15:25:24,025 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover lease [Lease. Holder: DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25, pendingcreates:

Writing data to hbase from reducer

2013-08-28 Thread jamal sasha
Hi, I have data in form: source, destination, connection This data is saved in hdfs I want to read this data and put it in hbase table something like: Column1 (source) | Column2(Destination)| Column3(Connection Type) Rowvertex A| vertex B | connection

Re: Writing data to hbase from reducer

2013-08-28 Thread Doug Meil
MapReduce job reading in your data in HDFS and then emitting Puts against the target table in the Mapper since it looks like there isn't any transform happening... http://hbase.apache.org/book/mapreduce.example.html Likewise, what Harsh said a few days ago. On 8/27/13 6:33 PM, Harsh J

Re: Writing data to hbase from reducer

2013-08-28 Thread Surendra , Manchikanti
Hbase comes with Bulkload tool. Please check below link. http://hbase.apache.org/book/arch.bulk.load.html Regards, Surendra M -- Surendra Manchikanti On Wed, Aug 28, 2013 at 11:39 PM, Doug Meil doug.m...@explorysmedical.comwrote: MapReduce job reading in your data in HDFS and then

Re: Newbie in hbase Trying to run an example

2013-08-28 Thread Doug Meil
cf in this example is a column family, and this needs to exist in the tables (both input and output) before the job is submitted. On 8/26/13 3:01 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I am new to hbase, so few noob questions. So, I created a table in hbase: A quick scan gives

HBase indexing and updating

2013-08-28 Thread Flavio Pompermaier
Hi to everybody, I have two questions: - My HBase table is composed by a UUID as a key and xml as content in a single column. Which is at the moment the best option to read all those xml, deserialize to their object representation and add them to Solr (or another indexing system)? The problem

Coprocessor responseTooSlow error messages

2013-08-28 Thread Kiru Pakkirisamy
I keep getting these error message when I run multiple clients. For a single client, the same table/query gets done in 400 msec. But for 60 clients it jumps to 10 secs (1msec). Any ideas on where the bottle neck could be ? Or how to go about debugging this.   Regards, - kiru Kiru

RE: experiencing high latency for few reads in HBase

2013-08-28 Thread Vladimir Rodionov
1. 4 sec max latency is not that bad taking into account 12GB heap. It can be much larger. What is your SLA? 2. Block evictions is the result of a poor cache hit rate and the root cause of a periodic stop-the-world GC pauses (max latencies latencies you have been observing in the test) 3.

RE: experiencing high latency for few reads in HBase

2013-08-28 Thread Vladimir Rodionov
Just ignore last part: 'If you don have in_memory column families you may decrease' Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Vladimir Rodionov Sent: Wednesday, August

Re: Coprocessor responseTooSlow error messages

2013-08-28 Thread Ted Yu
Can you post error message ? Thanks On Wed, Aug 28, 2013 at 12:00 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: I keep getting these error message when I run multiple clients. For a single client, the same table/query gets done in 400 msec. But for 60 clients it jumps to 10 secs

KeyValue.parseColumn

2013-08-28 Thread Vandana Ayyalasomayajula
Hi All, I have been looking at the parseColumn method in the KeyValue class of HBase. The javadoc of that method does not recommend that method be used. I wanted to know if there is any other existing API which can be used in place of the above method. Thanks Vandana

Re: experiencing high latency for few reads in HBase

2013-08-28 Thread Kiru Pakkirisamy
Right 4 sec is good.   @Saurabh - so your read is - getting 20 out of 25 millions rows ?. Is this a Get or a Scan ? BTW, in this stress test how many concurrent clients do you have ?    Regards, - kiru From: Vladimir Rodionov vrodio...@carrieriq.com To:

Re: Coprocessor responseTooSlow error messages

2013-08-28 Thread Kiru Pakkirisamy
Ted, There are no error message. Only a WARN. It lists all the arguments to the call.  It seems like a resource configuration issue. I am unable to get past 60 concurrent clients or so. And I did set the rpc handler count to 400. (I just deleted the log file so I have to redo it again, it takes

Re: How to enable metrics2?

2013-08-28 Thread Elliott Clark
On Wed, Aug 28, 2013 at 6:55 AM, Ionut Ignatescu ionut.ignate...@gmail.com wrote: MetricsSink that export metrics in a custom format. This solution runs Metrics2 support in HBase will be released in 0.96. 0.94.x versions of HBase still use the older metrics system.

RowLocks

2013-08-28 Thread Kristoffer Sjögren
Hi About the internals of locking a row in hbase. Does hbase row locks map one-to-one with a locks in zookeeper or are there any optimizations based on the fact that a row only exist on a single machine? Cheers, -Kristoffer

Re: experiencing high latency for few reads in HBase

2013-08-28 Thread Saurabh Yahoo
Hi Vlad, Thanks for your response. 1. Our SLA is less than one sec. we cannot afford latency more than 1 sec. We can increase heap size if that help, we have enough memory on server. What would be the optimal heap size? 2. Cache hit ratio is 95%. One thing I don't understand that we have

Re: RowLocks

2013-08-28 Thread Ted Yu
RowLock API has been removed in 0.96. Can you tell us your use case ? On Wed, Aug 28, 2013 at 3:14 PM, Kristoffer Sjögren sto...@gmail.comwrote: Hi About the internals of locking a row in hbase. Does hbase row locks map one-to-one with a locks in zookeeper or are there any optimizations

Re: experiencing high latency for few reads in HBase

2013-08-28 Thread Saurabh Yahoo
Thanks Kitu. We need less than 1 sec latency. We are using both muliGet and get. We have three concurrent clients running 10 threads each. ( that makes total 30 concurrent clients). Thanks, Saurabh. On Aug 28, 2013, at 4:30 PM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Right 4

Re: RowLocks

2013-08-28 Thread Jean-Marc Spaggiari
Worst case you can use ZK to do the same if you only need that from time to time? Le 2013-08-28 18:19, Ted Yu yuzhih...@gmail.com a écrit : RowLock API has been removed in 0.96. Can you tell us your use case ? On Wed, Aug 28, 2013 at 3:14 PM, Kristoffer Sjögren sto...@gmail.com wrote:

Re: RowLocks

2013-08-28 Thread Kristoffer Sjögren
I want a distributed lock condition for doing certain operations that may or may not be unrelated to hbase. On Thu, Aug 29, 2013 at 12:18 AM, Ted Yu yuzhih...@gmail.com wrote: RowLock API has been removed in 0.96. Can you tell us your use case ? On Wed, Aug 28, 2013 at 3:14 PM, Kristoffer

Re: KeyValue.parseColumn

2013-08-28 Thread Stack
On Wed, Aug 28, 2013 at 12:46 PM, Vandana Ayyalasomayajula avand...@yahoo-inc.com wrote: Hi All, I have been looking at the parseColumn method in the KeyValue class of HBase. The javadoc of that method does not recommend that method be used. I wanted to know if there is any other existing

Re: experiencing high latency for few reads in HBase

2013-08-28 Thread Kiru Pakkirisamy
Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time. Maybe should consider

Re: KeyValue.parseColumn

2013-08-28 Thread Vandana Ayyalasomayajula
Thanks Stack for following up. On Aug 28, 2013, at 3:31 PM, Stack wrote: On Wed, Aug 28, 2013 at 12:46 PM, Vandana Ayyalasomayajula avand...@yahoo-inc.com wrote: Hi All, I have been looking at the parseColumn method in the KeyValue class of HBase. The javadoc of that method does not

RE: experiencing high latency for few reads in HBase

2013-08-28 Thread Vladimir Rodionov
Increasing Java heap size will make latency worse, actually. You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB). I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles. You can greatly reduce

Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
Any ideas? Anyone? On Wed, Aug 28, 2013 at 9:36 AM, Ameya Kanitkar am...@groupon.com wrote: Thanks for your response. I checked namenode logs and I find following: 2013-08-28 15:25:24,025 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover lease [Lease.

Re: Pack rows into a wide row for better performance?

2013-08-28 Thread 林煒清
Oh , so they have them packed into one cell . If so, now its reasonable that they claim it speed up row seeking . thanks a lot. 2013/8/28 Chris Perluss tradersan...@gmail.com Sorry, accidentally hit send. I'm guessing a 10 minute time slice would drop their space savings from 4-8x down to

Re: RowLocks

2013-08-28 Thread Chris Perluss
You could add an isLocked column to your row. When you want to lock or update your row then use checkAndPut and check that isLocked=0. When unlocking your row then checkAndPut that isLocked=1. You will have effectively locked the row for the purposes of your application without affecting HBase

Re: RowLocks

2013-08-28 Thread Michael Segel
Ted, Can you clarify... Do you mean the API is no longer a public API, or do you mean no more RLL for atomic writes? On Aug 28, 2013, at 5:18 PM, Ted Yu yuzhih...@gmail.com wrote: RowLock API has been removed in 0.96. Can you tell us your use case ? On Wed, Aug 28, 2013 at 3:14 PM,

Re: RowLocks

2013-08-28 Thread Ted Yu
The API is no longer a public API Thanks On Wed, Aug 28, 2013 at 7:58 PM, Michael Segel michael_se...@hotmail.comwrote: Ted, Can you clarify... Do you mean the API is no longer a public API, or do you mean no more RLL for atomic writes? On Aug 28, 2013, at 5:18 PM, Ted Yu

Re: RowLocks

2013-08-28 Thread lars hofhansl
Specifically the API has been removed because it had never actually worked correctly. Rowlocks are used by RegionServers for intra-region operations. As such they are ephemeral, in-memory constructs, that cannot reliably outlive a single RPC request. The HTable rowlock API allowed you to

Re: experiencing high latency for few reads in HBase

2013-08-28 Thread lars hofhansl
A 1s SLA is tough in HBase (or any large memory JVM application). Maybe, if you presplit your table, play with JDK7 and the G1 collector, but nobody here will vouch for such an SLA in the 99th percentile. I heard some folks have experimented with 30GB heaps and G1 and have reported max GC