Re: Use of MD5 as row keys - is this safe?

2012-07-20 Thread Anton Lyska
Hi, I use reversed hex for auto-incremented ids. For example: id=123456, row key=042E1 id=123457, row key=142E1 I've started use this approach recently, but it seems it works pretty well. All regions are distributed uniformly, with no hot-spotting 2012/7/20 Jonathan Bishop > Hi, > > I know it i

Re: HBase shell

2012-07-20 Thread Mohit Anchlia
On Fri, Jul 20, 2012 at 6:18 PM, Dhaval Shah wrote: > Mohit, HBase shell is a JRuby wrapper and as such has all functions > available which are available using Java API.. So you can import the Bytes > class and the do a Bytes.toString() similar to what you'd do in Java > > Ah I see, you mean I cha

Re: HBase shell

2012-07-20 Thread Dhaval Shah
Mohit, HBase shell is a JRuby wrapper and as such has all functions available which are available using Java API.. So you can import the Bytes class and the do a Bytes.toString() similar to what you'd do in Java Regards, Dhaval From: Mohit Anchlia To: user@hba

Can serially code software utilize Hadoop benefits

2012-07-20 Thread kbrownk
I'm looking to setup a Hadoop cluster to take advantage of its parallel architecture. I'd like to use R, SAS, or equivalent to run datamining or analytics models. Since these analytics will by and large be serial, will there be no benefit to using Hadoop over a single server? That is, does the sof

Re: Question about regions splitting

2012-07-20 Thread Jimmy Xiang
At the very beginning, the two daughter regions are on the same region server as the parent region. But they can be moved to other region servers by region balancer. HFiles of a region may not be on the same host as the region. Thanks, Jimmy On Fri, Jul 20, 2012 at 2:44 PM, Haijia Zhou wrote: >

Question about regions splitting

2012-07-20 Thread Haijia Zhou
I have a question about how regions split. Let's say we have a 2GB region and upon splitting the region will be splitted into two 1GB sub-regions. My first question is: will the two 1GB sub-regions always be on the same host as the parent region? My second question is: Let's say the HDFS block size

Re: Java Client Tutorial

2012-07-20 Thread Asaf Mesika
The best examples I saw were on the book HBase - The Definitive Guide. -- Asaf Mesika Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday 20 July 2012 at 22:25, Mohit Anchlia wrote: > Is there any place that has good examples of HBase java API calls? > >

Re: Java Client Tutorial

2012-07-20 Thread Stack
On Fri, Jul 20, 2012 at 9:25 PM, Mohit Anchlia wrote: > Is there any place that has good examples of HBase java API calls? http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html#package_description ... which is referred to from the reference guide. For further sampl

Re: Use of MD5 as row keys - is this safe?

2012-07-20 Thread Joe Pallas
On Jul 20, 2012, at 12:16 PM, Michel Segel wrote: > I don't believe that there has been any reports of collisions, but if. You > are concerned you could use the SHA-1 for generating the hash. Relatively > speaking, SHA-1is slower, but still fast enough for most applications. Every hash functio

RE: Use of MD5 as row keys - is this safe?

2012-07-20 Thread Rob Roland
I use a SHA1 hash of an identifier as a rowkey and store the unhashed version in a "metadata" column family. Makes for a good distribution of keys and an easy thing to pre-split tables on. From: Michel Segel Sent: 7/20/2012 12:16 PM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: U

Re: Use of MD5 as row keys - is this safe?

2012-07-20 Thread Michel Segel
I don't believe that there has been any reports of collisions, but if. You are concerned you could use the SHA-1 for generating the hash. Relatively speaking, SHA-1is slower, but still fast enough for most applications. Don't know if it's speed relative to an MD5 and string cat, but it should yi

Re: Is this exception caused by an overloaded node?

2012-07-20 Thread Jimmy Xiang
This exception means the scanner is expired in the region server side. You can adjust the scanner expiration setting, or make your client fast. Thanks, Jimmy On Fri, Jul 20, 2012 at 9:27 AM, Jonathan Bishop wrote: > Hi, > > I am running on a cluster where some of the machines are loaded for oth

Re: Use of MD5 as row keys - is this safe?

2012-07-20 Thread Damien Hardy
Le 20/07/2012 18:22, Jonathan Bishop a écrit : > Hi, > > I know it is a commonly suggested to use an MD5 checksum to create a row > key from some other identifier, such as a string or long. This is usually > done to guard against hot-spotting and seems to work well. > > My concern is that there no

Use of MD5 as row keys - is this safe?

2012-07-20 Thread Jonathan Bishop
Hi, I know it is a commonly suggested to use an MD5 checksum to create a row key from some other identifier, such as a string or long. This is usually done to guard against hot-spotting and seems to work well. My concern is that there no guard against collision when this is done - two different s

Re: HTable.coprocessorExec call times out

2012-07-20 Thread Kevin
I checked the regionservers' .out log and found that they were producing an out of memory exception. I found a couple of fatal memory bugs in my code and now everything seems to be fine. I don't know how I forgot to look at the .out log file. On Fri, Jul 20, 2012 at 9:35 AM, Ted Yu wrote: > Can

Re: HTable.coprocessorExec call times out

2012-07-20 Thread Ted Yu
Can you check the following config param to see if its value is high enough ? hbase.zookeeper.property.maxClientCnxns Cheers On Fri, Jul 20, 2012 at 6:23 AM, Kevin wrote: > In zookeeper I see that regionserver connections are timing out. I open an > HTable, call coprocessorExec, then I clo

Re: HTable.coprocessorExec call times out

2012-07-20 Thread Kevin
In zookeeper I see that regionserver connections are timing out. I open an HTable, call coprocessorExec, then I close the HTable. This is done in the for-loop. I'm not sure why the regionservers are timing out. I think don't think anymore it's a client-side issue but maybe a server-side issue with

Re: HConnectionManager get closed

2012-07-20 Thread syed kather
Thanks Jean-Daniel Cryans .. Yes .. some job are executed successfully .. Some of them getting failed .. i dont know why this is happening .. Thanks and Regards, S SYED ABDUL KATHER On Fri, Jul 20, 2012 at 12:33 AM, Jean-Daniel Cryans wrote: > Is your job configured to talk

RE: Applying QualifierFilter to one column family only.

2012-07-20 Thread Dhaval Shah
Alternately you can use a filter list and say first column family and qualifier filter or second column family.. -- On Fri 20 Jul, 2012 8:40 AM IST Anoop Sam John wrote: >Yes I was having this doubt. So if you know exactly the qualifier names in >advance you can

Re: Error in importtsv

2012-07-20 Thread Mohammad Tariq
Have you changed the owner of your directory to hduser??? On Friday, July 20, 2012, iwannaplay games wrote: > I ran this command > > ./hbase org.apache.hadoop.hbase.mapreduce.ImportTsv > -Dimporttsv.columns=HBASE_ROW_KEY,startip,endip,countryname IPData > /usr/ipdata.txt > > > It says : > > INFO

Re: Error in importtsv

2012-07-20 Thread lars hofhansl
importtsv runs as an M/R job, so the file needs to exist in HDFS (unless you're running in local mode, in which case you can try to use a file URL: file:///usr/ipdata.txt, although I have not tried that). See here: http://hadoop.apache.org/common/docs/r0.17.2/hdfs_shell.html specifically -copyF