Re: WrongRegionException -- add_table.rb screwed up my hbase.

2010-05-11 Thread maxjar10
Thanks for the reply St.Ack. No worries, it's the nature of early development and I look forward to a lot of great things for HBase and Hadoop. Just wanted to document the issues for posterity purposes. stack-3 wrote: > > Sorry for trouble caused. I thought that 0.20.4 added updating of > .reg

Re: WrongRegionException -- add_table.rb screwed up my hbase.

2010-05-11 Thread Stack
Sorry for trouble caused. I thought that 0.20.4 added updating of .regioninfo on renable of table but I don't see it. Nonetheless, I'd suggest you update to 0.20.4. It should have fixes at least to save your from WRE going forward. Thanks for writing the list, St.Ack On Tue, May 11, 2010 at 9:

Re: WrongRegionException -- add_table.rb screwed up my hbase.

2010-05-11 Thread maxjar10
Answered my own question. The .regioninfo files are there specifically for performing fsck functionalities like using add_table.rb. The problem is that the .regioninfo files are NOT updated after an alter. This issue is described in: https://issues.apache.org/jira/browse/HBASE-2366 The purpose

Re: Using HBase on other file systems

2010-05-11 Thread Kevin Apte
I think Gluster also supports large amounts of data- but as I understand it - Gluster nodes are meant to be "Bricks" that is they are only meant for Storage. In Map-Reduce use - people talk about Map/Reduce jobs running near the storage- What does it mean? - They run on the same node th

RE: Running HBase in Standalone Limitation

2010-05-11 Thread Jonathan Gray
The HBase process just died? The logs end suddenly with nothing about shutting down, no exceptions, etc? Did you check the .out files as well? > -Original Message- > From: Jorome m [mailto:jorom...@gmail.com] > Sent: Tuesday, May 11, 2010 5:58 PM > To: hbase-user@hadoop.apache.org > Sub

WrongRegionException -- add_table.rb screwed up my hbase.

2010-05-11 Thread maxjar10
Ok, here's my story in case anyone else encounters the same issue... My question is this... Why does the table descriptor/meta table information not match the .regioninfo in each region sub dir? Is this a bad thing? Read below... HBase 0.23-1 Hadoop 0.20.1 So I wanted to add compression to my

Running HBase in Standalone Limitation

2010-05-11 Thread Jorome m
I wonder if there is known performance limitation running HBase in the standalone-mode? In particular, is there an upper bound to the resource usage (e.g., total number of HTables, column families, number row records per HTable) in a single region server that is not backed by an HDFS? I'm experien

RE: Using HBase on other file systems

2010-05-11 Thread Buttler, David
If you are opening up the discussion to HDFS, I would really like to think more deeply as to why HDFS is a better choice for some workloads than, say, Luster or GPFS. The things I like about HDFS over Luster is that 1) it is easier to set up 2) HDFS by default has local storage (as opposed to st

Hadoop Training @ Hadoop Summit - Early Bird Discount Expires Soon!

2010-05-11 Thread Christophe Bisciglia
Hadoop Fans, just a quick note about training options at the Hadoop Summit. There are discounts expiring soon, so if you planned to attend, or didn't know, we want to make sure you stay in the loop. We're offering certification courses for developers and admins, as well as an introduction to Hadoo

Re: Using HBase on other file systems

2010-05-11 Thread Jeff Hammerbacher
Hey Edward, I do think that if you compare GoogleFS to HDFS, GFS looks more full > featured. > What features are you missing? Multi-writer append was explicitly called out by Sean Quinlan as a bad idea, and rolled back. From internal conversations with Google engineers, erasure coding of blocks s

Re: Using HBase on other file systems

2010-05-11 Thread Edward Capriolo
On Tue, May 11, 2010 at 5:40 PM, Jeff Hammerbacher wrote: > Okay, the assertion that HBase is only interesting if you need HDFS is > continuing to rankle for me. On the surface, it sounds reasonable, but it's > just so wrong. The specifics cited (caching, HFile, and compaction) are > actually all

Re: Using HBase on other file systems

2010-05-11 Thread Jeff Hammerbacher
Okay, the assertion that HBase is only interesting if you need HDFS is continuing to rankle for me. On the surface, it sounds reasonable, but it's just so wrong. The specifics cited (caching, HFile, and compaction) are actually all advantages of the HBase design. 1) Caching: any data store which t

Re: Using HBase on other file systems

2010-05-11 Thread Jeff Hammerbacher
Hey Edward, Database systems have been built for decades against a storage medium (spinning magnetic platters) which have the same characteristics you point out in HDFS. In the interim, they've managed to service a large number of low latency workloads in a reasonable fashion. There's a reason the

Re: Using HBase on other file systems

2010-05-11 Thread Edward Capriolo
On Tue, May 11, 2010 at 3:51 PM, Jeff Hammerbacher wrote: > Hey, > > Thanks for the evaluation, Andrew. Ceph certainly is elegant in design; > HDFS, similar to GFS [1], was purpose-built to get into production quickly, > so its current incarnation lacks some of the same elegance. On the other > ha

Re: Using HBase on other file systems

2010-05-11 Thread Jeff Hammerbacher
Hey, Thanks for the evaluation, Andrew. Ceph certainly is elegant in design; HDFS, similar to GFS [1], was purpose-built to get into production quickly, so its current incarnation lacks some of the same elegance. On the other hand, there are many techniques for making the metadata servers scalable

Re: Problem with performance with many columns in column familie

2010-05-11 Thread Ted Yu
jstack is a handy tool: http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstack.html On Tue, May 11, 2010 at 9:50 AM, Sebastian Bauer wrote: > Ram is not a problem, second region server using about 550mB and first > about 300mB problem is with CPU, when i making queries to both column > famiel

Re: Problem with performance with many columns in column familie

2010-05-11 Thread Sebastian Bauer
Ram is not a problem, second region server using about 550mB and first about 300mB problem is with CPU, when i making queries to both column famielies second region server is using ablut 40% - 80% first about 10%, after turning off queries to AdvToUsers(this big) CPU on both servers are 2-7%.

Re: Problem with performance with many columns in column familie

2010-05-11 Thread Stack
You could try thread-dumping the regionserver to try and figure where its hung up. Counters are usually fast so maybe its something to do w/ 8k of them in the one row. What kinda numbers are you seeing? How much RAM you throwing at the problem? Yours, St.Ack On Tue, May 11, 2010 at 8:51 AM,

Re: Enabling Indexing in HBase

2010-05-11 Thread Jean-Daniel Cryans
Per http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/package-summary.html#overview your client has to know where your zookeeper setup is. Since you want to use HBase in a distributed fashion, that means you went through http://hadoop.apache.org/hbase/docs/r0.20.4/api/

Problem with performance with many columns in column familie

2010-05-11 Thread Sebastian Bauer
Hi, maybe i'll get help here :) I have 2 tables, UserToAdv and AdvToUsers. UserToAdv is simple: { "row_id" => [ {"adv:": }, {"adv:": }, .about 100 columns ] only one kind of operation is perform - increasing cou

RE: Enabling Indexing in HBase

2010-05-11 Thread Michelan Arendse
Thanks. I have added that to the class path, but I still get an error. This is the error that I get: 10/05/11 13:41:27 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=6 watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc.

public HBase 0.20.4 EC2 AMIs available in all regions

2010-05-11 Thread Andrew Purtell
HBase 0.20.4 EC2 AMIs are now available in all regions. These are instance store backed AMIs. The latest launch scripts can be found here: https://hbase.s3.amazonaws.com/hbase-ec2-0.20.4.tar.gz Region -- AMIID ArchName -- -