Re: How to limit columns returned by a single row in HBase

2014-07-19 Thread Ted Yu
If I understand SiMa's use case correctly, after top records for (file A)_ are returned, (file B)_ would be next. Therefore some kind of filter is needed server side to skip the remaining records for (file A)_. Another (corner) case is that for certain file prefix, there may not be as many records

Re: How to limit columns returned by a single row in HBase

2014-07-19 Thread Arun Allamsetty
Hi, I have an idea which might be just bulloni, but people learn from mistakes and this is my attempt to learn. So if I properly understand user use case, you want to get the first 500 records pertaining to a file based on its file name. Since you want to limit the number of records written, I won

Re: How to limit columns returned by a single row in HBase

2014-07-19 Thread Ted Yu
You can write your own filter, based on ColumnCountGetFilter, by not overriding filterAllRemaining() method. In filterKeyValue() method, when count is bigger than limit, the method returns NEXT_ROW. Your filter can remember the file prefix of the previous row. If file prefix of current row is the

How to limit columns returned by a single row in HBase

2014-07-19 Thread SiMaYunRui
Hi experts, I have a wide-flat table, and during scan, how can I limit columns returned by a single row, instead of all rows (what ColumnCountGetFilter does)? Because I need to scan multiple rows at the same time, and in client side to do aggregation. Put more background, I am designing an

Re: Cluster sizing guidelines

2014-07-19 Thread Andrew Purtell
The i2.8xlarge and hs1.8xlarge EC2 instance types would provide opportunity for testing what really happens today when you attempt a high density storage architecture with HDFS and HBase. The hs1 type has 24 spinning disks. I think the i2.8xlarge better represents near-future challenges in effec

Re: Cluster sizing guidelines

2014-07-19 Thread lars hofhansl
Yeah. Right direction. Correct on 3 counts. Should have read all email before I replied to your earlier one. From: Amandeep Khurana To: "user@hbase.apache.org" Sent: Thursday, July 17, 2014 11:36 AM Subject: Re: Cluster sizing guidelines On Wed, Jul 16,

Re: Cluster sizing guidelines

2014-07-19 Thread lars hofhansl
We can answer #3 at least: You can store about 2T of effective data per node in HBase, unless you have a mostly read-only load. See here for the reasoning: http://hadoop-hbase.blogspot.de/2013/01/hbase-region-server-memory-sizing.html For #1 we were traditionally bound by disk/network IO (due to

ANNOUNCE: Tephra for HBase transactions

2014-07-19 Thread Gary Helmling
Hi all, I would like to introduce a new open source project, Continuuity Tephra, which provides scalable, distributed transactions for Apache HBase. Tephra provides "snapshot isolation" for concurrent transactions spanning multiple regions, tables, and RPC calls. A central transaction manager pr