Re: [ANN] HBase 0.20.0-alpha available for download

2009-06-17 Thread Andrew Purtell
Minor correction/addition: "Stargate" is undergoing shared development in two github trees: http://github.com/macdiesel/stargate/tree/master http://github.com/apurtell/stargate/tree/master The github network view can show which branch is leading at any point in time: http://github.com/

Re: [ANN] HBase 0.20.0-alpha available for download

2009-06-17 Thread jason hadoop
Is there a requirement for hadoop 0.20 for HBase 0.20? On Wed, Jun 17, 2009 at 1:44 AM, Andrew Purtell wrote: > Minor correction/addition: "Stargate" is undergoing shared development in > two github trees: > http://github.com/macdiesel/stargate/tree/master > http://github.com/apurtell/s

Re: [ANN] HBase 0.20.0-alpha available for download

2009-06-17 Thread Jean-Daniel Cryans
If you follow the "Getting Started" link that Stack gave, in the "Requirements" section, you can see that this version of HBase will only run on Hadoop 0.20 J-D On Wed, Jun 17, 2009 at 7:32 AM, jason hadoop wrote: > Is there a requirement for hadoop 0.20 for HBase 0.20? > > > On Wed, Jun 17, 200

Can't list table that exists inside HBase

2009-06-17 Thread Lucas Nazário dos Santos
Hi all, I'm running HBase 0.19.3 with Hadoop 0.19.1 on a clusters of 2 machines operating Linux Ubuntu. Files are not being stored inside the /tmp folder. The problem that already occurred 3 times is that, suddenly, all data stored in my table is gone after the entire cluster is restarted, either

Re: Can't list table that exists inside HBase

2009-06-17 Thread Erik Holstad
Hi Lucas! Just a quick thought. Do you have a lot of data in your cluster or just a few things in there? If you don't have that much data in HBase it might not have been flushed to disk/HDFS yet and therefore only sits in the internal memcache in HBase, so when your machines are turned of, that dat

Re: Can't list table that exists inside HBase

2009-06-17 Thread Lucas Nazário dos Santos
Hi Erik, I have only a small amount of data, something between 1500 e 3000 documents. Is there a way to force a flush of those documents? 1500 to 3000 is the number of new documents that the application I'm currently working on inserts everyday, so I think it would be nice to flush them all to di

Re: Can't list table that exists inside HBase

2009-06-17 Thread Lucas Nazário dos Santos
But isn't it strange that the whole table suddenly became unavailable? Specially because it's inside HDFS. Also, I've already created tables with very few rows, 250 for instance, that kept available after shutting down and starting again HBase. Is it because when HBase is properly shut data is flu

Re: Can't list table that exists inside HBase

2009-06-17 Thread Erik Holstad
Hi Lucas! Yeah, have a look at HBaseAdmin and you will find flush and compact. Not sure that compact is going to make a big difference in your case, since you only have one flush or so per day, but might be nice for you to run it too. Running a compaction means that all you flushed files will be re

Re: Can't list table that exists inside HBase

2009-06-17 Thread Jean-Daniel Cryans
Lucas, Your table is "missing" because the edits in the META table aren't flush, in 0.20 we "fix" this by setting a very small maximum memcache size on both ROOT and META tables so that the edits go to disk often. If all the nodes in your cluster are shutdown at the same moment, another problem th

Re: Can't list table that exists inside HBase

2009-06-17 Thread Erik Holstad
Hi Lucas! Not sure if you have had a look at the BigTable paper, link in the beginning of http://hadoop.apache.org/hbase/ might clear some of the confusion. But basically what happens is to support fast writes we only write to memory and periodically flush this data to disk, so while data is still

Re: Can't list table that exists inside HBase

2009-06-17 Thread Lucas Nazário dos Santos
Helped a lot! Thanks for the replies. I'll keep coding and move to newer versions of HBase and Hadoop as soon as they are out. I'll also have a look at the flush operation from HBaseAdmin. Lucas On Wed, Jun 17, 2009 at 1:58 PM, Erik Holstad wrote: > Hi Lucas! > Not sure if you have had a look

Re: Can't list table that exists inside HBase

2009-06-17 Thread stack
See also 'tools' in the hbase shell. There is a tool to flush all in a table or an individual region. I also need to roll a 0.19.4 candidate. It has a few issues that have us flushing catalog tables way more frequently that we used to. St.Ack On Wed, Jun 17, 2009 at 10:04 AM, Lucas Nazário dos

Re: [ANN] HBase 0.20.0-alpha available for download

2009-06-17 Thread llpind
Sweet thanks stack. I'll be upgrading as well. My client program takes far too long to simply open a scanner. This problem appears to have been addressed in .20 (https://issues.apache.org/jira/browse/HBASE-1118). In order to skip reloading the data I do the following: 1. Shutdown hadoop/hbas

Re: [ANN] HBase 0.20.0-alpha available for download

2009-06-17 Thread Ryan Rawson
In the original email: " In particular, this alpha release is without a migration script to bring your 0.19.x era data forward to work on hbase 0.20.0. " There is no migration for 0.19.x datafiles yet. That is due for the RC a few weeks away. If you can reload your data, that'd be best for now.

Re: [ANN] HBase 0.20.0-alpha available for download

2009-06-17 Thread llpind
Opps missed that bit. Okay, thanks Ryan. Ryan Rawson wrote: > > In the original email: > > " In particular, this alpha release is without a migration > script to bring your 0.19.x era data forward to work on hbase 0.20.0. " > > There is no migration for 0.19.x datafiles yet. That is due fo

Re: [ANN] HBase 0.20.0-alpha available for download

2009-06-17 Thread stack
On Wed, Jun 17, 2009 at 2:34 PM, llpind wrote: > > Sweet thanks stack. I'll be upgrading as well. My client program takes > far > too long to simply open a scanner. This problem appears to have been > addressed in .20 (https://issues.apache.org/jira/browse/HBASE-1118). Or was it HBASE-867?

0.20 performance numbers

2009-06-17 Thread Ski Gh3
In the NOSQL meetup slides the inserts and reads are really good, but the test is on single column and only 16bytes, I wonder how the numbers would be affected if the row grows to 1K bytes, even 16Kbytes? if the numbers are disk I/O bounded, then we almost have to multiply the numbers by 64 or 102

Re: 0.20 performance numbers

2009-06-17 Thread Ryan Rawson
Hey, The interesting thing is due to the way things are handled internally, small values are more challenging than large ones. The performance is not strictly IO bound or limited, and you won't be seeing corresponding slowdowns on larger values. I encourage you to give download the alpha and giv

Re: 0.20 performance numbers

2009-06-17 Thread Ryan Rawson
And when I say 'test suite' i really mean "performance suite" -- that's the problem, test suites we've been running test the functionality, not the speed in a repeatable/scientific manner. -ryan On Wed, Jun 17, 2009 at 5:46 PM, Ryan Rawson wrote: > Hey, > > The interesting thing is due to the

Re: 0.20 performance numbers

2009-06-17 Thread Ski Gh3
Hmmm, don't we have a performance benchmark for comparing with Bigtable? seems a while since someone updates that... I was just hoping that someone has a rough number in mind, so that i don't get any big surpirse when i try this out on the larger row size data. Thanks! On Wed, Jun 17, 2009 at 5:5

Re: 0.20 performance numbers

2009-06-17 Thread Ryan Rawson
>From the talk given at hadoop summit: Fat Table: 1000 Rows with 10 Columns,1MB values Sequential insert – 68 seconds (68 ms/row) Random reads – 56.92 ms/row (average) Full scan – 35 seconds (3.53 seconds/100 rows, 35ms/row) so for 1 MB values, we are getting a value in 56ms. Scans in 35ms/row

Re: 0.20 performance numbers

2009-06-17 Thread Ski Gh3
That's really cool~ Thanks for the info, Ryan!!! Cheers, Ski Gh On Wed, Jun 17, 2009 at 5:59 PM, Ryan Rawson wrote: > From the talk given at hadoop summit: > > Fat Table: 1000 Rows with 10 Columns,1MB values > Sequential insert – 68 seconds (68 ms/row) > Random reads – 56.92 ms/row (average) >