Re: How to rename table's family name

2011-01-08 Thread Stack
Thanks for the below M. C. I like this delete suggestion. Plan is in 0.92 or 0.94 moving schema out of .META. up into zk. We're aiming for online schema editing w/o having to take a table offline. When schema changes, regionserves are notified and take approriate action. To respond to Andrey,

Re: log reply failures, how to resolve

2011-01-08 Thread Jack Levin
Sure, this does not resolve the lease issue. To reproduce, just restart the namenode , have hbase hdfs clients fail, then try cold restart of the cluster -Jack On Jan 8, 2011, at 6:50 PM, Todd Lipcon wrote: > Hi Jack, > > Do you have a rack topology script set up for HDFS? > > -Todd > > O

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Jack Levin
Suppose we used different families, how would it help ? -Jack On Jan 8, 2011, at 6:47 PM, Todd Lipcon wrote: > Hi Jack, > > Why not put photos and texts in separate column families? > > -Todd > > On Sat, Jan 8, 2011 at 2:57 PM, Jack Levin wrote: > >> Future wise we plan to have millions

Re: log reply failures, how to resolve

2011-01-08 Thread Todd Lipcon
Hi Jack, Do you have a rack topology script set up for HDFS? -Todd On Fri, Jan 7, 2011 at 6:32 PM, Jack Levin wrote: > Greetings all. I have been observing some interesting problems that > sometimes making hbase start/restart very hard to achieve. Here is a > situation: > > Power goes out of

Re: problem with LZO compressor on write only loads

2011-01-08 Thread Todd Lipcon
Hey everyone, Just wanted to let you know that I will be looking into this this coming week - we've marked it as an important thing to investigate prior t our next beta release. Thanks -Todd On Sat, Jan 8, 2011 at 4:59 AM, Tatsuya Kawano wrote: > > Hi Friso, > > So you found HBase 0.89 on CDH3b

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Todd Lipcon
Hi Jack, Why not put photos and texts in separate column families? -Todd On Sat, Jan 8, 2011 at 2:57 PM, Jack Levin wrote: > Future wise we plan to have millions of rows, probably across multiple > regions, even if IO is not a problem, doing millions of filter operations > does not make much s

Re: What is a HBase backup strategy?

2011-01-08 Thread Alexey Kovyrin
I'm pretty sure it could potentially create an inconsistent copy of your database. On Sat, Jan 8, 2011 at 5:59 PM, Jack Levin wrote: > distcp into a different hadoop cluster nightly, maybe a valid choice as > well. > > -Jack > > On Sat, Jan 8, 2011 at 7:37 AM, Ted Yu wrote: > >> One option is to

Re: What is a HBase backup strategy?

2011-01-08 Thread Jack Levin
distcp into a different hadoop cluster nightly, maybe a valid choice as well. -Jack On Sat, Jan 8, 2011 at 7:37 AM, Ted Yu wrote: > One option is to use org.apache.hadoop.hbase.mapreduce.Export > Later you can Import the data back. > > On Sat, Jan 8, 2011 at 12:12 AM, Sean Bigdatafun > wrote: >

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Jack Levin
Future wise we plan to have millions of rows, probably across multiple regions, even if IO is not a problem, doing millions of filter operations does not make much sense. -Jack On Sat, Jan 8, 2011 at 2:54 PM, Andrey Stepachev wrote: > Ok. Understand. > > But do you check is it really an issue?

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Andrey Stepachev
Ok. Understand. But do you check is it really an issue? I think that it is only 1 IO here, (especially if compression used)? You have big rows? 2011/1/9 Jack Levin > Sorting is not the issue, the location of data can be in the beginning, > middle or end, or any combination of thereof. I only

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Jack Levin
Sorting is not the issue, the location of data can be in the beginning, middle or end, or any combination of thereof. I only given the worst case scenario example, I understand that filtering will produce results we want but at cost of examining every row and offloading AND/join logic to the appli

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Andrey Stepachev
More details on binary sorting you can read http://brunodumon.wordpress.com/2010/02/17/building-indexes-using-hbase-mapping-strings-numbers-and-dates-onto-bytes/ 2011/1/8 Jack Levin > Basic problem described: > > user uploads 1 image and creates some text -10 days ago, then creates 1000 > text m

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Andrey Stepachev
Hm. But what the problem to have Long.MAX - dayNum instead of dayNum? In this case you get all data sorted in reverse order and you give last entries first in scan results? 2011/1/8 Jack Levin > Basic problem described: > > user uploads 1 image and creates some text -10 days ago, then creates 10

Re: Region loadbalancing

2011-01-08 Thread M. C. Srivas
If you did the change, can you share your experience/results? On Wed, Dec 15, 2010 at 12:04 AM, Jan Lukavský wrote: > We can give it a try. Currently we use 512 MiB per region, is there any > upper bound for this value which is not recommended to cross? Are there any > side-effects we may expect

Re: How to rename table's family name

2011-01-08 Thread M. C. Srivas
In general. there's need for a loose "schema" to allow not only renames of columns and column-families, but efficient delete of entire columns or CFs. (eg, mark this C as deleted in the "schema" and remove it during the next major compaction). But implementing the master-coordination for this (for

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Jack Levin
Basic problem described: user uploads 1 image and creates some text -10 days ago, then creates 1000 text messages on between 9 days ago and today: row key | fm:type --> value 00days:uid | type:text --> text_id . . 09days:uid | type:text --> text_id 10days:uid | type:photo

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Stack
Strike that. This is a Scan, so can't do blooms + filter. Sorry. Sounds like a coprocessor then. You'd have your query 'lean' on the column that you know has the lesser items and then per item, you'd do a get inside the coprocessor against the column of many entries. The get would go via blooms

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Stack
On Sat, Jan 8, 2011 at 11:35 AM, Jack Levin wrote: > Yes, we thought about using filters, the issue is, if one family > column has 1ml values, and second family column has 10 values at the > bottom, we would end up scanning and filtering 0 records and > throwing them away, which seems ineffici

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Stack
Sounds like you need to write a little filter Jack, one that filters all that does not have values from all query columns. Maybe you can manhandle SkipFilter into doing the job? http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/filter/SkipFilter.html St.Ack On Sat, Jan

Re: How to rename table's family name

2011-01-08 Thread Andrey Stepachev
2011/1/8 Stack > > > Perhaps we should consider > detaching CF name from whats stored? > Yes! Are there any jira? I'll vote for it. > > St.Ack >

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Jack Levin
Yes, we thought about using filters, the issue is, if one family column has 1ml values, and second family column has 10 values at the bottom, we would end up scanning and filtering 0 records and throwing them away, which seems inefficient. The only solution is to break the tables apart, and do

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Andrey Stepachev
I don't think that it is possible on scanner level with bloomfilters (families are in separate files, so they scanned independently). But you can use filters, to filter out unneeded data. 2011/1/8 Jack Levin > Hello all, I have a scanner question, we have this table: > > hbase(main):002:0> scan

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Jack Levin
Sorry, my mistake, right now its only OR, and we really need AND. I would think that with bloomfilters this could be a sweet feature to produce if its not there. -Jack On Fri, Jan 7, 2011 at 10:50 PM, Phil Whelan wrote: > Hi Jack, > > I'm just trying follow the logic and I'm a bit confused. > >

Breaking down an HBase read through thrift

2011-01-08 Thread Wayne
I am trying to understand exactly what an HBase read is doing through Thrift (python) so that we can know what to change to improve our performance (read latency). We have turned off all cache to make testing consistent. *Region/Meta Cache * Often times the region list is not "hot" and thrift has

Re: What is a HBase backup strategy?

2011-01-08 Thread Ted Yu
One option is to use org.apache.hadoop.hbase.mapreduce.Export Later you can Import the data back. On Sat, Jan 8, 2011 at 12:12 AM, Sean Bigdatafun wrote: > For RDBMS systems, people normally backup their data from time to time. > What > is the backup strategy (and using what tool) for HBase? > Th

Re: problem with LZO compressor on write only loads

2011-01-08 Thread Tatsuya Kawano
Hi Friso, So you found HBase 0.89 on CDH3b2 doesn't have the problem. I wonder what would happen if you replace hadoop-core-*.jar in CDH3b3 with the one contained in HBase 0.90RC distribution (hadoop-core-0.20-append-r1056497.jar) and then rebuild hadoop-lzo against it. Here is the comment

Re: problem with LZO compressor on write only loads

2011-01-08 Thread Friso van Vollenhoven
Hey Ryan, I went back to the older version. Problem is that going to HBase 0.90 requires a API change on the compressor side, which forces you to a version newer than 0.4.6 or so. So I also had to go back to HBase 0.89, which is again not compatible with CDH3b3, so I am back on CDH3b2 again. HBa

Re: Getting rid of "delete forward" in HBase 0.92+, please weigh in

2011-01-08 Thread Henning Blohm
+1 from here as well Please let delete work as if it was just a special marker value of a column (i.e. with a time stamp and all). On Fri, 2011-01-07 at 19:24 -0800, M. C. Srivas wrote: > +1 > > Just a clarification : by delete-forward, do you mean that a delete of a > non-existent key causes

What is a HBase backup strategy?

2011-01-08 Thread Sean Bigdatafun
For RDBMS systems, people normally backup their data from time to time. What is the backup strategy (and using what tool) for HBase? Thanks, -- --Sean