Row-key in HBase

2008-04-28 Thread Goel, Ankur
Hi folks, I am using HBase table to store my crawled data and using the MD5 signature of the canonicalized URL as a row key in HBase. The bigtable paper suggest using keys appropriately so that URLs from the same domain are stored close to each other and domain analysis can be carried ou

Re: Row-key in HBase

2008-04-28 Thread stack
It will. You could make a two part key where the first half is an md5 of the domain portion of the URL and the second part, the md5 of the URL path portion. Your keys would be wider but domains would sort together. St.Ack On Mon, Apr 28, 2008 at 4:01 AM, Goel, Ankur <[EMAIL PROTECTED]> wrote:

Re: Row-key in HBase

2008-04-28 Thread Bryan Duxbury
Yes, MD5ing your urls will randomize the results. Do you need to access pages by MD5 of URL? If so its unlikely that you also need to access them by domain. -Bryan On Apr 28, 2008, at 4:01 AM, Goel, Ankur wrote: Hi folks, I am using HBase table to store my crawled data and usin

HBase region server id problem with ipv6 address

2008-04-28 Thread Zhou
Hi, I've met a problem startup HBase. I setup hbase with hdfs, My server's network card has a ipv4 address and also a ipv6 address. When I first startup hbase with default configuration file, I found that the region server can't register to master. And I found lots of 127.0.0.1 in log. So I su

Re: HBase region server id problem with ipv6 address

2008-04-28 Thread Bryan Duxbury
Zhou, Can you create an issue and post a patch of the changes you made? -Bryan On Apr 28, 2008, at 9:42 AM, Zhou wrote: Hi, I've met a problem startup HBase. I setup hbase with hdfs, My server's network card has a ipv4 address and also a ipv6 address. When I first startup hbase with defaul

NotServingRegionException revisited

2008-04-28 Thread David Alves
Hi Guys I have found, what I think is a strange case. Last Friday a M/R task failed constantly (if a task fails for some reason it is later reran a number of times to make sure service outages won't stop the process) with NotServingRegionException. The thing here is that that parti

RE: NotServingRegionException revisited

2008-04-28 Thread David Alves
Hi Again After going through the logs a bit more carefully I found a FNFE while trying to do a compaction on that particular region. The relevant log follows attached. After the failed compaction because of the FNFE the region is still online in .META. but no longer among the onli

Is HBase suitable for ...

2008-04-28 Thread Max Grigoriev
Hi there, I'm making research to find right solution for our needs. We need persistent layer for groups of social network. These groups will have big amount of data ( ~100 GB) - users profiles, their activities and etc. And all job with these entities should be make online - user can ask to unsubs

Errors while loading data into HBase

2008-04-28 Thread Erik Holstad
Hi! Just managed to gather the log files from one error run. The error message changes between 3 different ones, this time it was Exception in thread "main" org.apache.hadoop.hbase.TableNotFoundException: Table 'y' does not exist. at org.apache.hadoop.hbase.HConnectionManager$TableServers.

Re: HBase region server id problem with ipv6 address

2008-04-28 Thread Zhou
Bryan Duxbury <[EMAIL PROTECTED]> writes: > > Zhou, > > Can you create an issue and post a patch of the changes you made? OK > > -Bryan >

Bug with IPv6 address

2008-04-28 Thread Zhou
HBase might crash when the network card has a IPv6 address In order to avoid the problem, I modify a method in class: org.apache.hadoop.net.DNS The following is the modified code of this method, it would not return IPv6 address now. /** * Returns all the IPs associated with the provided interf

Problem I'm having with Release Candidate 0.1.2

2008-04-28 Thread Daniel Leffel
Ever since upgrading to the first 0.1.2 release candidate, I cannot alter or drop a table. The shell reports the operation succeeded, but the master node starts failing to respond to any requests (although it's still running). Stopping and starting hbase causes the table to be unchanged. Below is a

Re: Problem I'm having with Release Candidate 0.1.2

2008-04-28 Thread Daniel Leffel
New piece of info. Now a simple select count(*) from sample_table; results in the following error: HRegionInfo was null or empty in .META. On Mon, Apr 28, 2008 at 7:04 PM, Daniel Leffel <[EMAIL PROTECTED]> wrote: > Ever since upgrading to the first 0.1.2 release candidate, I cannot alter > or d

obtain the maximum value of the row id of a table

2008-04-28 Thread Zhou Wei
Hi I want to find out the maximum value of the row id of a table. Is there a simple and efficient way to do this without scan through the whole table starting from the first row? Thanks. Zhou

Re: obtain the maximum value of the row id of a table

2008-04-28 Thread Bryan Duxbury
Hm, tricky. You don't have the scan the whole table - just the last region. You can find out what the start key of the last region is by using HTable#getStartKeys. If this isn't an acceptable solution, we might be able to think up a way to get the last real row of a table more efficiently.

Re: Is HBase suitable for ...

2008-04-28 Thread Bryan Duxbury
My replies and questions inline. On Apr 28, 2008, at 2:57 PM, Max Grigoriev wrote: Hi there, I'm making research to find right solution for our needs. We need persistent layer for groups of social network. These groups will have big amount of data ( ~100 GB) - users profiles, their activiti