If I understand SiMa's use case correctly, after top records for (file A)_
are returned, (file B)_ would be next.
Therefore some kind of filter is needed server side to skip the remaining
records for (file A)_.
Another (corner) case is that for certain file prefix, there may not be as
many records
Hi,
I have an idea which might be just bulloni, but people learn from mistakes
and this is my attempt to learn. So if I properly understand user use case,
you want to get the first 500 records pertaining to a file based on its
file name. Since you want to limit the number of records written, I won
You can write your own filter, based on ColumnCountGetFilter, by not
overriding filterAllRemaining() method.
In filterKeyValue() method, when count is bigger than limit, the method
returns NEXT_ROW.
Your filter can remember the file prefix of the previous row. If file
prefix of current row is the
Hi experts,
I have a wide-flat table, and during scan, how can I limit columns returned by
a single row, instead of all rows (what ColumnCountGetFilter does)? Because I
need to scan multiple rows at the same time, and in client side to do
aggregation.
Put more background, I am designing an
The i2.8xlarge and hs1.8xlarge EC2 instance types would provide opportunity for
testing what really happens today when you attempt a high density storage
architecture with HDFS and HBase. The hs1 type has 24 spinning disks. I think
the i2.8xlarge better represents near-future challenges in effec
Yeah. Right direction. Correct on 3 counts. Should have read all email before I
replied to your earlier one.
From: Amandeep Khurana
To: "user@hbase.apache.org"
Sent: Thursday, July 17, 2014 11:36 AM
Subject: Re: Cluster sizing guidelines
On Wed, Jul 16,
We can answer #3 at least: You can store about 2T of effective data per node in
HBase, unless you have a mostly read-only load.
See here for the reasoning:
http://hadoop-hbase.blogspot.de/2013/01/hbase-region-server-memory-sizing.html
For #1 we were traditionally bound by disk/network IO (due to
Hi all,
I would like to introduce a new open source project, Continuuity Tephra,
which provides scalable, distributed transactions for Apache HBase.
Tephra provides "snapshot isolation" for concurrent transactions spanning
multiple regions, tables, and RPC calls. A central transaction manager
pr