Thanks for the below M. C. I like this delete suggestion. Plan is
in 0.92 or 0.94 moving schema out of .META. up into zk. We're aiming
for online schema editing w/o having to take a table offline. When
schema changes, regionserves are notified and take approriate action.
To respond to Andrey,
Sure, this does not resolve the lease issue. To reproduce, just restart the
namenode , have hbase hdfs clients fail, then try cold restart of the cluster
-Jack
On Jan 8, 2011, at 6:50 PM, Todd Lipcon wrote:
> Hi Jack,
>
> Do you have a rack topology script set up for HDFS?
>
> -Todd
>
> O
Suppose we used different families, how would it help ?
-Jack
On Jan 8, 2011, at 6:47 PM, Todd Lipcon wrote:
> Hi Jack,
>
> Why not put photos and texts in separate column families?
>
> -Todd
>
> On Sat, Jan 8, 2011 at 2:57 PM, Jack Levin wrote:
>
>> Future wise we plan to have millions
Hi Jack,
Do you have a rack topology script set up for HDFS?
-Todd
On Fri, Jan 7, 2011 at 6:32 PM, Jack Levin wrote:
> Greetings all. I have been observing some interesting problems that
> sometimes making hbase start/restart very hard to achieve. Here is a
> situation:
>
> Power goes out of
Hey everyone,
Just wanted to let you know that I will be looking into this this coming
week - we've marked it as an important thing to investigate prior t our next
beta release.
Thanks
-Todd
On Sat, Jan 8, 2011 at 4:59 AM, Tatsuya Kawano wrote:
>
> Hi Friso,
>
> So you found HBase 0.89 on CDH3b
Hi Jack,
Why not put photos and texts in separate column families?
-Todd
On Sat, Jan 8, 2011 at 2:57 PM, Jack Levin wrote:
> Future wise we plan to have millions of rows, probably across multiple
> regions, even if IO is not a problem, doing millions of filter operations
> does not make much s
I'm pretty sure it could potentially create an inconsistent copy of
your database.
On Sat, Jan 8, 2011 at 5:59 PM, Jack Levin wrote:
> distcp into a different hadoop cluster nightly, maybe a valid choice as
> well.
>
> -Jack
>
> On Sat, Jan 8, 2011 at 7:37 AM, Ted Yu wrote:
>
>> One option is to
distcp into a different hadoop cluster nightly, maybe a valid choice as
well.
-Jack
On Sat, Jan 8, 2011 at 7:37 AM, Ted Yu wrote:
> One option is to use org.apache.hadoop.hbase.mapreduce.Export
> Later you can Import the data back.
>
> On Sat, Jan 8, 2011 at 12:12 AM, Sean Bigdatafun
> wrote:
>
Future wise we plan to have millions of rows, probably across multiple
regions, even if IO is not a problem, doing millions of filter operations
does not make much sense.
-Jack
On Sat, Jan 8, 2011 at 2:54 PM, Andrey Stepachev wrote:
> Ok. Understand.
>
> But do you check is it really an issue?
Ok. Understand.
But do you check is it really an issue? I think that it is only 1 IO here,
(especially
if compression used)? You have big rows?
2011/1/9 Jack Levin
> Sorting is not the issue, the location of data can be in the beginning,
> middle or end, or any combination of thereof. I only
Sorting is not the issue, the location of data can be in the beginning,
middle or end, or any combination of thereof. I only given the worst case
scenario example, I understand that filtering will produce results we want
but at cost of examining every row and offloading AND/join logic to the
appli
More details on binary sorting you can read
http://brunodumon.wordpress.com/2010/02/17/building-indexes-using-hbase-mapping-strings-numbers-and-dates-onto-bytes/
2011/1/8 Jack Levin
> Basic problem described:
>
> user uploads 1 image and creates some text -10 days ago, then creates 1000
> text m
Hm. But what the problem to have Long.MAX - dayNum instead of dayNum?
In this case you get all data sorted in reverse order and you give last
entries
first in scan results?
2011/1/8 Jack Levin
> Basic problem described:
>
> user uploads 1 image and creates some text -10 days ago, then creates 10
If you did the change, can you share your experience/results?
On Wed, Dec 15, 2010 at 12:04 AM, Jan Lukavský wrote:
> We can give it a try. Currently we use 512 MiB per region, is there any
> upper bound for this value which is not recommended to cross? Are there any
> side-effects we may expect
In general. there's need for a loose "schema" to allow not only renames of
columns and column-families, but efficient delete of entire columns or CFs.
(eg, mark this C as deleted in the "schema" and remove it during the next
major compaction). But implementing the master-coordination for this (for
Basic problem described:
user uploads 1 image and creates some text -10 days ago, then creates 1000
text messages on between 9 days ago and today:
row key | fm:type --> value
00days:uid | type:text --> text_id
.
.
09days:uid | type:text --> text_id
10days:uid | type:photo
Strike that. This is a Scan, so can't do blooms + filter. Sorry.
Sounds like a coprocessor then. You'd have your query 'lean' on the
column that you know has the lesser items and then per item, you'd do
a get inside the coprocessor against the column of many entries. The
get would go via blooms
On Sat, Jan 8, 2011 at 11:35 AM, Jack Levin wrote:
> Yes, we thought about using filters, the issue is, if one family
> column has 1ml values, and second family column has 10 values at the
> bottom, we would end up scanning and filtering 0 records and
> throwing them away, which seems ineffici
Sounds like you need to write a little filter Jack, one that filters
all that does not have values from all query columns. Maybe you can
manhandle SkipFilter into doing the job?
http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/filter/SkipFilter.html
St.Ack
On Sat, Jan
2011/1/8 Stack
>
>
> Perhaps we should consider
> detaching CF name from whats stored?
>
Yes! Are there any jira? I'll vote for it.
>
> St.Ack
>
Yes, we thought about using filters, the issue is, if one family
column has 1ml values, and second family column has 10 values at the
bottom, we would end up scanning and filtering 0 records and
throwing them away, which seems inefficient. The only solution is to
break the tables apart, and do
I don't think that it is possible on scanner level with bloomfilters
(families are in separate files, so
they scanned independently).
But you can use filters, to filter out unneeded data.
2011/1/8 Jack Levin
> Hello all, I have a scanner question, we have this table:
>
> hbase(main):002:0> scan
Sorry, my mistake, right now its only OR, and we really need AND.
I would think that with bloomfilters this could be a sweet feature to
produce if its not there.
-Jack
On Fri, Jan 7, 2011 at 10:50 PM, Phil Whelan wrote:
> Hi Jack,
>
> I'm just trying follow the logic and I'm a bit confused.
>
>
I am trying to understand exactly what an HBase read is doing through Thrift
(python) so that we can know what to change to improve our performance (read
latency). We have turned off all cache to make testing consistent.
*Region/Meta Cache
*
Often times the region list is not "hot" and thrift has
One option is to use org.apache.hadoop.hbase.mapreduce.Export
Later you can Import the data back.
On Sat, Jan 8, 2011 at 12:12 AM, Sean Bigdatafun
wrote:
> For RDBMS systems, people normally backup their data from time to time.
> What
> is the backup strategy (and using what tool) for HBase?
> Th
Hi Friso,
So you found HBase 0.89 on CDH3b2 doesn't have the problem. I wonder what would
happen if you replace hadoop-core-*.jar in CDH3b3 with the one contained in
HBase 0.90RC distribution (hadoop-core-0.20-append-r1056497.jar) and then
rebuild hadoop-lzo against it.
Here is the comment
Hey Ryan,
I went back to the older version. Problem is that going to HBase 0.90 requires
a API change on the compressor side, which forces you to a version newer than
0.4.6 or so. So I also had to go back to HBase 0.89, which is again not
compatible with CDH3b3, so I am back on CDH3b2 again. HBa
+1 from here as well
Please let delete work as if it was just a special marker value of a
column (i.e. with a time stamp and all).
On Fri, 2011-01-07 at 19:24 -0800, M. C. Srivas wrote:
> +1
>
> Just a clarification : by delete-forward, do you mean that a delete of a
> non-existent key causes
For RDBMS systems, people normally backup their data from time to time. What
is the backup strategy (and using what tool) for HBase?
Thanks,
--
--Sean
29 matches
Mail list logo