Re: Fixing badly distributed table manually.

2012-09-05 Thread David Koch
Hello, I also found this fairly recent script here which can be used with Gnuplot to get a visual representation of data distribution across nodes: http://bobcopeland.com/blog/2012/04/graphing-hbase-splits/ Again, my JRuby skills are non-existent so just blindly running the script versus HBase 0

AccessDeniedException on the regionserver with security enabled

2012-09-05 Thread Ivan Frain
Hi all, My config: - Linux ubuntu 12.04 - java 1.7.0_05 from oracle - hbase 0.94.1 compiled with "-Dhadoop.profile=2.0 -Psecurity -Prelease" - hadoop-2.0.0-alpha - zookeeper-3.4.3 - mit kdc with REALM=HADOOP.LAN - all is running on a single ubuntu box which hostname is kdc.hadoop.lan HDFS

reduce influence of auto-splitting region

2012-09-05 Thread jing wang
Hi there, Using Hbase as a realtime storage(7*24h), how to reduce the influence of region auto-splitting? Any advice will be appreciated! Thanks, Jing

Re: Extremely slow when loading small amount of data from HBase

2012-09-05 Thread n keywal
Hi, With 8 regionservers, yes, you can. Target a few hundreds by default imho. N. On Wed, Sep 5, 2012 at 4:55 AM, 某因幡 wrote: > +HBase users. > > > -- Forwarded message -- > From: Dmitriy Ryaboy > Date: 2012/9/4 > Subject: Re: Extremely slow when loading small amount of data fr

Re: Help with troubleshooting the HBase replication setup

2012-09-05 Thread Stas Maksimov
This issue is now solved. Having installed two new clusters, everything works as expected. Thanks, Stas On Tue, Sep 4, 2012 at 4:28 PM, Stas Maksimov wrote: > Hi there, > > I'm trying to set up replication in master-slave mode between two > clusters, and when this works set up master-master rep

Re: reduce influence of auto-splitting region

2012-09-05 Thread Jean-Marc Spaggiari
Hi Jing, If you pre-split your regions a lot, you will reduce the number and the influence of the auto-splits. But for that you need to know very well the way the data is going to come into your database to make sure you split your regions evenly. JM 2012/9/5, jing wang : > Hi there, > > Using

cannot create a table having a snappy compression algorithm in mac os x.

2012-09-05 Thread Henry JunYoung KIM
Hi, hbase users. I have a problem to create a table with a snappy algorithm in Mac OS X (lion). I am sure that there is a snappy jar file in $HBASE_HOME/lib, but, it couldn't create it without errors. environment : hbase vesion : 0.92.1 distribution mode : pseudo-distributed (1

Re: Extremely slow when loading small amount of data from HBase

2012-09-05 Thread Jean-Marc Spaggiari
But I think you should also look at why we have so many regions... Because even if you merge them manually now, you might face the same issu soon. 2012/9/5, n keywal : > Hi, > > With 8 regionservers, yes, you can. Target a few hundreds by default imho. > > N. > > On Wed, Sep 5, 2012 at 4:55 AM, 某因

Re: reduce influence of auto-splitting region

2012-09-05 Thread jing wang
Hi JM, Thanks for your reply. More questions:what does 'the way' you said mean? The rowkey ranges, just like hbase book said? http://hbase.apache.org/book/perf.writing.html Thanks, Jing Wang 2012/9/5 Jean-Marc Spaggiari > Hi Jing, > > If you pre-split your regions a lot, you will reduce

RE: reduce influence of auto-splitting region

2012-09-05 Thread Ramkrishna.S.Vasudevan
You can use the property hbase.hregion.max.filesize. You can set this to a higher value and control the splits through your application. Regards Ram > -Original Message- > From: jing wang [mailto:happygodwithw...@gmail.com] > Sent: Wednesday, September 05, 2012 3:48 PM > To: user@hbase.a

Re: reduce influence of auto-splitting region

2012-09-05 Thread jing wang
Hi Ram, Thanks for your advice. We did consider what you said. As Hbase is used as a realtime storage,just like mysql/oracle. When splitted, hbase may lead gc to 'stop the world' or some long time full gc. Our application can't accpet this. Thanks, Jing Wang 2012/9/5 Ramkrishna.S.Vasudevan > Y

Re: Fixing badly distributed table manually.

2012-09-05 Thread Vincent Barat
Hi, Balancing regions between RS is correctly handled by HBase : I mean that your RSs always manage the same number of regions (the balancer takes care of it). Unfortunately, balancing all the regions of one particular table between the RS of your cluster is not always easy, since HBase (as

RE: reduce influence of auto-splitting region

2012-09-05 Thread Ramkrishna.S.Vasudevan
Hi JingWang It is not necessary that region split can cause GC problems. Based on your use case we may need to configure heapspace for the RS. Coming back to region splits, presplit of the tables created is a good option. Assume a case where I know that the data that is going to come into hbase

Re: reduce influence of auto-splitting region

2012-09-05 Thread jing wang
Hi Ram, How to drive the data to the specific hourly region? Use the code like http://hbase.apache.org/book/perf.writing.html? Thanks, Jing Wang 2012/9/5 Ramkrishna.S.Vasudevan > Hi JingWang > > It is not necessary that region split can cause GC problems. Based on your > use case we may ne

Re: Extremely slow when loading small amount of data from HBase

2012-09-05 Thread Doug Meil
You have are 4000 regions on an 8 node cluster? I think you need to bring that *way* down… re: "something like 40 regions" Yep… around there. See… http://hbase.apache.org/book.html#bigger.regions On 9/5/12 8:06 AM, "Jean-Marc Spaggiari" wrote: >But I think you should also look at wh

Re: Fixing badly distributed table manually.

2012-09-05 Thread Vincent Barat
Hi, Balancing regions between RS is correctly handled by HBase : I mean that your RSs always manage the same number of regions (the balancer takes care of it). Unfortunately, balancing all the regions of one particular table between the RS of your cluster is not always easy, since HBase (as

RegionServer not shutting down in a timely manner

2012-09-05 Thread Jeff Whiting
I had to stop all the region servers in my cluster but they get stuck for a long time. However they will eventually shutdown. I think this maybe the reason for the long shutdown time: 2012-09-05 09:26:31,436 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Encountered

Re: batch update question

2012-09-05 Thread Lin Ma
Thank you Stack for the details directions! 1. You are right, I have not met with any real row contention issues. My purpose is understanding the issue in advance, and also from this issue to understand HBase generals better; 2. For the comments from API Url page you referred -- "If isAutoFlush

BigDecimalColumnInterpreter

2012-09-05 Thread Julian Wissmann
Hi, I am currently experimenting with the BigDecimalColumnInterpreter from https://issues.apache.org/jira/browse/HBASE-6669. I was thinking the best way for me to work with it would be to use the Java class and just use that as is. Imported it into my project and tried to work with it as is, by

Re: BigDecimalColumnInterpreter

2012-09-05 Thread Ted Yu
You haven't told us the schema of your table yet. Your table should have column whose value can be interpreted by BigDecimalColumnInterpreter. Cheers On Wed, Sep 5, 2012 at 9:17 AM, Julian Wissmann wrote: > Hi, > > I am currently experimenting with the BigDecimalColumnInterpreter from > https://

Re: batch update question

2012-09-05 Thread Doug Meil
Hi there, if you look in the source code for HTable there is a list of Put objects. That's the buffer, and it's a client-side buffer. On 9/5/12 12:04 PM, "Lin Ma" wrote: >Thank you Stack for the details directions! > >1. You are right, I have not met with any real row contention issues. My

Re: batch update question

2012-09-05 Thread Doug Meil
Hi there, for more information about the hbase client, seeŠ http://hbase.apache.org/book.html#client On 9/5/12 12:59 PM, "Doug Meil" wrote: > >Hi there, if you look in the source code for HTable there is a list of Put >objects. That's the buffer, and it's a client-side buffer. > > > > > >

Re: BigDecimalColumnInterpreter

2012-09-05 Thread Julian Wissmann
Hi, the schema looks like this: RowKey: id,timerange_timestamp,offset (String) Qualifier: Offset (long) Timestamp: timestamp (long) Value:number (BigDecimal) Or as code when I read data from csv:byte[] value = Bytes.toBytes(BigDecimal.valueOf(Double.parseDouble(cData[2]))); Cheers, Julian 2012/

Re: BigDecimalColumnInterpreter

2012-09-05 Thread Ted Yu
And your HBase version is ? Since you use Double.parseDouble(), looks like it would be more efficient to develop DoubleColumnInterpreter. On Wed, Sep 5, 2012 at 12:07 PM, Julian Wissmann wrote: > Hi, > the schema looks like this: > RowKey: id,timerange_timestamp,offset (String) > Qualifier: Offs

Re: BigDecimalColumnInterpreter

2012-09-05 Thread Julian Wissmann
I get supplied with doubles from sensors, but in the end I loose too much precision if I do my aggregations on double, otherwise I'd go for it. I use 0.92.1, from Cloudera CDH4. I've done some initial testing with LongColumnInterpreter on a dataset that I've generated, to do some testing and get ac

Re: BigDecimalColumnInterpreter

2012-09-05 Thread Ted Yu
I added one review comment on HBASE-6669 . Thanks Julian for reminding me. On Wed, Sep 5, 2012 at 12:49 PM, Julian Wissmann wrote: > I get supplied with doubles from sensors, but in the end I loose too much > precision if I do my aggregations on

Re: BigDecimalColumnInterpreter

2012-09-05 Thread Julian Wissmann
Thank you! So this looks like the missing link here. I'll see if I can get it working, tomorrow morning. Cheers 2012/9/5 Ted Yu > I added one review comment on > HBASE-6669 > . > > Thanks Julian for reminding me. > > On Wed, Sep 5, 2012 at 12:49

Re: BigDecimalColumnInterpreter

2012-09-05 Thread anil gupta
Hi Julian, I have been running the same class on my distributed cluster for aggregation. It has been working fine. The only difference is that i use the methods provided incom.intuit.ihub.hbase.poc.aggregation.client. AggregationClientclass. IMHO, you don't need to define an Endpoint for using the

Re: BigDecimalColumnInterpreter

2012-09-05 Thread anil gupta
Hi Julian, Sorry for wrong reference to the aggregation client class in my previous email. Here is the right class:org.apache.hadoop.hbase.client.coprocessor.AggregationClient. HTH, Anil On Wed, Sep 5, 2012 at 2:04 PM, anil gupta wrote: > Hi Julian, > > I have been running the same class on my

Re: why hbase doesn't provide Encryption

2012-09-05 Thread Julian Wissmann
What problem are you trying to solve? Do you want encryption between server and client, between servers or encryption of data within Hbase? You need to be more specific. If one of the first two is what you want: This kind of stuff can easily be achieved with stunnel or OpenVPN and can probably be

RS not processing any requests

2012-09-05 Thread Nathaniel Cook
We are experiencing a problem where RS are locking up and not processing any requests. Restarting the RS will fix the problem and operations will continue as normal. We are experiencing this issue under load and on two different clusters. We are importing existing data via the hbase mapreduce impor

Re: confused by two add method of Put class

2012-09-05 Thread Julian Wissmann
They do the exact same thing. In fact the non-KV add looks like this: public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { List list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.g

Re: RS not processing any requests

2012-09-05 Thread Stack
On Wed, Sep 5, 2012 at 2:58 PM, Nathaniel Cook wrote: > We ran a jstack on the both the RS process and the hbase shell process > trying to do the scan. > > Jstack log for RS: > http://pastebin.com/9Y9t5ERE > What JVM (I don't know what (20.10-b01 mixed mode) is). I see a bunch of this: "PRI IP

Re: RS not processing any requests

2012-09-05 Thread Himanshu Vashishtha
Your RS priority handlers are blocked on meta lookup, so it becomes unresponsive. Looks like you hitting https://issues.apache.org/jira/browse/HBASE-6165 You running HBase replication? just confirming. Himanshu On Wed, Sep 5, 2012 at 4:39 PM, Stack wrote: > On Wed, Sep 5, 2012 at 2:58 PM, Nathan

Re: RS not processing any requests

2012-09-05 Thread Jeff Whiting
I work with Nathaniel and can answer those questions. We are using Sun's jvm. $ java -version java version "1.6.0_21" Java(TM) SE Runtime Environment (build 1.6.0_21-b06) Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode) We also tried one node on a newer version but saw the same th

Re: RS not processing any requests

2012-09-05 Thread Jeff Whiting
Yes we are running hbase replication. ~Jeff On 9/5/2012 4:47 PM, Himanshu Vashishtha wrote: Your RS priority handlers are blocked on meta lookup, so it becomes unresponsive. Looks like you hitting https://issues.apache.org/jira/browse/HBASE-6165 You running HBase replication? just confirming.

Re: RS not processing any requests

2012-09-05 Thread Jeff Whiting
It looks like that is problem we are having. We are on 0.92 so we don't get the patch. But one solution seems to be increasing the privileged handlers. How do we increase the number of privilege handlers? ~Jeff On 9/5/2012 4:47 PM, Himanshu Vashishtha wrote: Your RS priority handlers are b

Re: RS not processing any requests

2012-09-05 Thread Himanshu Vashishtha
Number of PRI handlers are governed by "hbase.regionserver.metahandler.count"; default is 10. Increasing their number will not solve it, but will delay its occurring (i don't know about your load etc). Another related jira is HBase-6550. Some more context for your use case: http://search-hadoop.

Re: RS not processing any requests

2012-09-05 Thread Jeff Whiting
hmm. So if we are on 0.92 what suggestion would you have to prevent the problem? ~Jeff On 9/5/2012 5:23 PM, Himanshu Vashishtha wrote: Number of PRI handlers are governed by "hbase.regionserver.metahandler.count"; default is 10. Increasing their number will not solve it, but will delay its o

Re: RS not processing any requests

2012-09-05 Thread Himanshu Vashishtha
It usually happens in a long running setup (at least for me). Can you throttle your load? Replication is evolving; I'd say update if you can (or backport the jiras?). Himanshu On Wed, Sep 5, 2012 at 5:53 PM, Jeff Whiting wrote: > hmm. So if we are on 0.92 what suggestion would you have to pre

Re: RS not processing any requests

2012-09-05 Thread Ted Yu
Backport has been done in HBASE-6724 Cheers On Wed, Sep 5, 2012 at 5:09 PM, Himanshu Vashishtha wrote: > It usually happens in a long running setup (at least for me). Can you > throttle your load? > > Replication is evolving; I'd say update if you can (or backport the > jiras?). > > Himanshu >

Re: Extremely slow when loading small amount of data from HBase

2012-09-05 Thread 某因幡
Yes hbase.hregion.max.filesize was set to default 256MB and it was too low. 2012/9/5 Jean-Marc Spaggiari : > But I think you should also look at why we have so many regions... > Because even if you merge them manually now, you might face the same > issu soon. > > 2012/9/5, n keywal : >> Hi, >> >>

Re: why hbase doesn't provide Encryption

2012-09-05 Thread Farrokh Shahriari
Hi there I want encryption before pushing data into hbase,and I've done it by changing in some library to encrypt data.But it doesn't have any performance,I mean for each row it should encrypt/decrypt the cell,so for a query that has a lot of rows ,it will take a long time. my question is this,why

Re: why hbase doesn't provide Encryption

2012-09-05 Thread Stack
On Wed, Sep 5, 2012 at 10:31 PM, Farrokh Shahriari wrote: > But it doesn't have any > performance,I mean for each row it should encrypt/decrypt the cell,so for a > query that has a lot of rows ,it will take a long time. How else would you see it working? (We can't do a row at a time given ou

RE: reduce influence of auto-splitting region

2012-09-05 Thread Ramkrishna.S.Vasudevan
Yes. The row keys generated should be falling in the range of one of the region's start and end key . So HBase internally can take care of distributing to the specified region server. As mentioned in http://hbase.apache.org/book/perf.writing.html, we also need to take care of not making one parti

Re: why hbase doesn't provide Encryption

2012-09-05 Thread Farrokh Shahriari
Tnx Stack for giving your time to me. M.Zebeleh On Thu, Sep 6, 2012 at 10:06 AM, Stack wrote: > On Wed, Sep 5, 2012 at 10:31 PM, Farrokh Shahriari > wrote: > > But it doesn't have any > > performance,I mean for each row it should encrypt/decrypt the cell,so > for a > > query that has a lo