Re: How to do a fast range scan on a prefix

2015-06-05 Thread jeremy p
if (rowPrefix == null) { > > setStartRow(HConstants.EMPTY_START_ROW); > > setStopRow(HConstants.EMPTY_END_ROW); > > } else { > > this.setStartRow(rowPrefix); > > this.setStopRow(calculateTheClosestNextRowKeyForPrefix(rowPrefix)); > >

Re: How to do a fast range scan on a prefix

2015-06-05 Thread jeremy p
I've heard that PrefixFilter does a full table scan, and that a range scan is faster. Am I mistaken? On Fri, Jun 5, 2015 at 2:22 PM, Ted Yu wrote: > You can utilize PrefixFilter. > > See example in http://hbase.apache.org/book.html#scan > > On Fri, Jun 5, 2015 at 11:18 A

How to do a fast range scan on a prefix

2015-06-05 Thread jeremy p
Assume that my keys look like this : bar:0 bar:1 bar:2 baz:0 baz:1 foo:0 foo:1 foo:2 How do I do a fast range scan that returns all the rows that begin with "baz:"? Assume that I know nothing about any of the other rows in the table. Thanks for taking a look! --Jeremy

Re: RowKey hashing in HBase 1.0

2015-05-13 Thread jeremy p
The last region fills, but after it splits, the top half is static. The > new rows are added to the bottom half only. > > This is a problem with sequential keys that you have to learn to live with. > > Its not a killer issue, but something you need to be aware… > > > On May 6,

Re: RowKey hashing in HBase 1.0

2015-05-06 Thread jeremy p
ng n lists of rows, > but you’re still always adding to the end of the list. > > Does that make sense? > > > > On May 5, 2015, at 10:04 AM, jeremy p > wrote: > > > > Thank you for your response! > > > > So I guess 'salt' is a bit of a misnomer.

Re: RowKey hashing in HBase 1.0

2015-05-05 Thread jeremy p
> Common? > > Only if your row key is mostly sequential. > > Note that even with bucketing, you will still end up with regions only 1/2 > full with the only exception being the last region. > > > On May 1, 2015, at 11:09 AM, jeremy p > wrote: > > > > Hello

RowKey hashing in HBase 1.0

2015-05-01 Thread jeremy p
Hello all, I've been out of the HBase world for a while, and I'm just now jumping back in. As of HBase .94, it was still common to take a hash of your RowKey and use that to "salt" the beginning of your RowKey to obtain an even distribution among your region servers. Is this still a common pract

What companies are using HBase to serve a customer-facing product?

2014-12-05 Thread jeremy p
Hey all, So, I'm currently evaluating HBase as a solution for querying a very large data set (think 60+ TB). We'd like to use it to directly power a customer-facing product. My question is threefold : 1) What companies use HBase to serve a customer-facing product? I'm not interested in evaluation

Re: Does compression ever improve performance?

2014-06-13 Thread jeremy p
en when taking into account the CPU overhead of compressing and > decompressing). > > -Dima > > > On Fri, Jun 13, 2014 at 10:35 AM, jeremy p > > wrote: > > > Hey all, > > > > Right now, I'm not using compression on any of my tables, because our > da

Does compression ever improve performance?

2014-06-13 Thread jeremy p
Hey all, Right now, I'm not using compression on any of my tables, because our data doesn't take up a huge amount of space. However, I would turn on compression if there was a chance it would improve HBase's performance. By performance, I'm talking about the speed with which HBase responds to re

Re: How to specify a compression algorithm when creating a table with the HBaseAdmin object?

2014-06-12 Thread jeremy p
so going to update the documentation to reflect this. > > JM > > > 2014-06-12 19:25 GMT-04:00 jeremy p : > > > Awesome -- thank you both! > > > > --Jeremy > > > > > > On Wed, Jun 11, 2014 at 4:34 PM, Subbiah, Suresh > > wrote: >

Re: How to specify a compression algorithm when creating a table with the HBaseAdmin object?

2014-06-12 Thread jeremy p
seAdmin(config); > HTableDescriptor table = new > HTableDescriptor(TableName.valueOf(TABLE_NAME)); > table.addFamily(new > HColumnDescriptor(CF_DEFAULT).setCompressionType(Algorithm.SNAPPY)); > admin.createTable(table); > > > 2014-06-11 17:47 GMT-04:00 jeremy p : >

How to specify a compression algorithm when creating a table with the HBaseAdmin object?

2014-06-11 Thread jeremy p
I'm currently creating a table using the HBaseAdmin object. The reason I'm doing it with the HBaseAdmin object is that I need to pre-split the table by specifying the start key, end key, and number of regions. I want to use Snappy compression for this table, however, I haven't seen any way to do

Re: How to get the splitKeys for a table?

2014-05-30 Thread jeremy p
Thanks -- I'll give that a shot. --Jeremy On Fri, May 30, 2014 at 11:59 AM, Ted Yu wrote: > Take a look at the following method in HTable: > > public Pair getStartEndKeys() throws IOException { > > Cheers > > > On Fri, May 30, 2014 at 11:51 AM, jeremy p > &g

How to get the splitKeys for a table?

2014-05-30 Thread jeremy p
Let's say I created a table with this method in HBaseAdmin : createTable(HTableDescriptor desc, byte[][] splitKeys) Now, let's say I want to open up that table and get the array of splitKeys I used when creating the table. Is that possible? --Jeremy

Re: 答复: How to pass multiple operations to HBase and be guaranteed of execution order

2014-03-18 Thread jeremy p
e is to execute them serially where the latter is > always issued after the former is returned successfully. > ________ > 发件人: jeremy p [athomewithagroove...@gmail.com] > 发送时间: 2014年3月14日 6:39 > 收件人: user@hbase.apache.org > 主题: How to pass multiple

How to pass multiple operations to HBase and be guaranteed of execution order

2014-03-13 Thread jeremy p
Hello all, The documentation for htable.batch() warns us : "The execution ordering of the actions is not defined. Meaning if you do a Put and a Get in the same batch() call, you will not necessarily be guaranteed that the Get returns what the Put had put." Is there a way to pass multiple get() a

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-26 Thread jeremy p
. This way you avoid the hotspotting > > > problem on HBase due to MapReduce sorting. > > > > > > > > > On Tue, Sep 24, 2013 at 2:50 PM, Jean-Marc Spaggiari < > > > jean-m...@spaggiari.org> wrote: > > > > > >> Hi Jeremy, >

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread jeremy p
e to 4000 different > > > regions, which can be hosted in 4000 different servers if you have > that. > > > And there will be no hot-spotting? > > > > > > Then when you run MR job, you will have one mapper per region. Each > > region > > > will

Re: Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread jeremy p
ggiari < > jean-m...@spaggiari.org> wrote: > > > Hi Jeremy, > > > > I don't see any issue for HBase to handle 4000 tables. However, I don't > > think it's the best solution for your use case. > > > > JM > > > > > > 2013/9

Is there a problem with having 4000 tables in a cluster?

2013-09-24 Thread jeremy p
Short description : I'd like to have 4000 tables in my HBase cluster. Will this be a problem? In general, what problems do you run into when you try to host thousands of tables in a cluster? Long description : I'd like the performance advantage of pre-split tables, and I'd also like to do filter

Re: What happens when you add new HBase nodes to a cluster?

2013-05-15 Thread jeremy p
inal Message----- > From: jeremy p [mailto:athomewithagroove...@gmail.com] > Sent: Wednesday, May 15, 2013 2:26 PM > To: user@hbase.apache.org > Subject: What happens when you add new HBase nodes to a cluster? > > Hey all, > > We're wanting to add 10 additional nodes to our 20-

What happens when you add new HBase nodes to a cluster?

2013-05-15 Thread jeremy p
Hey all, We're wanting to add 10 additional nodes to our 20-node HBase cluster. Our tables are pre-split into 800 regions, 40 regions to a machine. What will happen when we add 10 new nodes to the cluster? Will the HBase load balancer automatically re-distribute these regions to the new nodes?