>
>
> Totally randoms (even on keys that do not exist).
>
It worth checking if it matches your real use cases. I expect that read by
row key are most of the time on existing rows (as a traditional db
relationship or a UI or workflow driven stuff), even if I'm sure it's
possible to have something t
Thanks for the detailed reply, Harsh.
Some further comments / thoughts,
1. For Scan function used in mapper/reducer, supposing we are using 500
size configuration, I am not sure whether the returned 500 items in one
batch call must from one region server? Or it could from multiple region
servers
Hi Lars:
Thanks for the reply.
I need to understand if I misunderstood the perceived inefficiency because
it seems you don't think quite the same.
Let say, as an example, we have 1 row with 2 columns (col-1 and col-2) in a
table and each column has 1000 versions. Using the following code (the cod
On Mon, Aug 27, 2012 at 8:30 PM, anil gupta wrote:
> Hi All,
>
> Here are the steps i followed to load the table with HFilev1 format:
> 1. Set the property hfile.format.version to 1.
> 2. Updated the conf across the cluster.
> 3. Restarted the cluster.
> 4. Ran the bulk loader.
>
> Table has 34 mi
Regards to all the list.
Well, you should ask to the Tumblr´s fellows that they use a combination
of MySQL and HBase for its blogging platform. They talked about this
topic in the last HBaseCon. Here is the link:
http://www.hbasecon.com/sessions/growing-your-inbox-hbase-at-tumblr/
Blake Mathen
Hi ,
I am on process to write my first bulk loading job. I use Cloudera
CDH3U3 with hbase 0.90.4
Executing a job I see HFiles which created after job finished but there
were no entries in hbase. hbase shell >> count 'uu_bulk' return 0.
Here is my job configuration:
Configuration
On Aug 25, 2012, at 2:57 PM, lars hofhansl wrote:
> Each column family is its own store. All stores are flushed together, so have
> many add overhead (especially if a few tend to hold a lot of data, but the
> others don't, leading to very many small store files that need to be
> compacted).
I
Hi,
You need to complete the bulk load.
Check out http://hbase.apache.org/book/arch.bulk.load.html 9.8.2
Igal.
On Tue, Aug 28, 2012 at 7:29 PM, Oleg Ruchovets wrote:
> Hi ,
>I am on process to write my first bulk loading job. I use Cloudera
> CDH3U3 with hbase 0.90.4
>
> Executing a job I se
On Tue, Aug 28, 2012 at 9:59 AM, Joe Pallas wrote:
>
> On Aug 25, 2012, at 2:57 PM, lars hofhansl wrote:
>
>> Each column family is its own store. All stores are flushed together, so
>> have many add overhead (especially if a few tend to hold a lot of data, but
>> the others don't, leading to ve
What I was saying was: It depends. :)
First off, how do you get to 1000 versions? In 0.94++ older version are pruned
upon flush, so you need 333 flushes (assuming 3 versions on the CF) to get 1000
versions.
By that time some compactions will have happened and you're back to close to 3
versions
Could it be the addition of the memstoreTS? i forget if that is in v1 as
well.
Matt
On Tue, Aug 28, 2012 at 7:37 AM, Stack wrote:
> On Mon, Aug 27, 2012 at 8:30 PM, anil gupta wrote:
> > Hi All,
> >
> > Here are the steps i followed to load the table with HFilev1 format:
> > 1. Set the proper
Are we terribly concerned about 3.5% of extra disk usage?
HFileV2 was designed to be more main memory efficient, which is in much shorter
supply than disk space (bloom filters and index blocks are interspersed with
data blocks and loaded when needed, etc)
The stored MemstoreTS was introduced in
On Tue, Aug 28, 2012 at 11:42 AM, lars hofhansl wrote:
> Are we terribly concerned about 3.5% of extra disk usage?
> HFileV2 was designed to be more main memory efficient, which is in much
> shorter supply than disk space (bloom filters and index blocks are
> interspersed with data blocks and lo
I think the memstoreTS is stored with each KV (until it can be proven to be not
needed - because no older open scanners, in which case it is not written during
the next compaction and assumed 0)
"Mild passing interest" :) Yep.
From: Stack
To: user@hbase.apac
Thanks for the info, Karthik (and sorry that I didn’t see it for so long, it
got auto-filed).
I think the reasoning behind the native client approach makes sense. I don’t
know how much of the extra hop overhead is network and how much is
serialization/deserialization, so for now I have been ho
I would still caution relying on the sorting order between values of the
same cf, qualifier and timestamp. If for example, there is a Delete, it
will eclipse subsequent Puts given the same timestamp, even though Put
happened after Delete.
Enis
On Mon, Aug 27, 2012 at 9:20 AM, Tom Brown wrote:
>
Hi Igal , thank you for the quick response .
Can I execute this step programmatically?
>From link you sent :
9.8.5. Advanced Usage
Although the importtsv tool is useful in many cases, advanced users may
want to generate data programatically, or import data from other formats.
To get started
Hi Lars:
I see. Please refer to the inline comment below.
Best Regards,
Jerry
On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl wrote:
> What I was saying was: It depends. :)
>
> First off, how do you get to 1000 versions? In 0.94++ older version are
> pruned upon flush, so you need 333 flushes
As suggested by the book, take a look at:
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles class,
This tool expects two arguments: (1) the path to the generated HFiles (in
your case it's outputPath) (2) the target table.
To use it programatically, you can either invoke it via the ToolRunner
Hi
I probably know the usual answer but are there any tricks to do some sort of
sort by value in HBase. The only option I know is to somehow embed value in the
key part. The value is not a timestamp but a normal number.
I want to find out, say, top 10 from a range of columns. The range could be
Study the OpenTSDB at StumbleUpon described by Benoit "tsuna" Sigoure
(ts...@stumbleupon.com) in the
HBaseCon talk called "Lessons Learned from OpenTSDB".
His team have done a great job working with Time-series data, and he
gave a lot of great advices to work with this kind of data with HBase:
How does it deal with multiple writes in the same milliseconds for the same
rowkey/column? I can't see that info.
On Tue, Aug 28, 2012 at 5:33 PM, Marcos Ortiz wrote:
> Study the OpenTSDB at StumbleUpon described by Benoit "tsuna" Sigoure (
> ts...@stumbleupon.com) in the
> HBaseCon talk called
Ping?
2012/8/28 某因幡 :
> Thanks for your quick reply.
> The co-processor looks like:
> public void postGet(final ObserverContext e,
> final Get get, final List results)
> {
> if table is X
> get some columns from table Y
> add these columns to results
> }
> And s
Can you give an example of what you are trying to do and how you would
use both the writes coming in at the same instant for the same cell
and why do you say that the nanosecond approach is tricky?
On Aug 28, 2012, at 5:54 PM, Mohit Anchlia wrote:
> How does it deal with multiple writes in the s
Did you find some clue from region server logs ?
You can pastebin snippet and show the link here.
On Aug 28, 2012, at 6:54 PM, 某因幡 wrote:
> Ping?
>
> 2012/8/28 某因幡 :
>> Thanks for your quick reply.
>> The co-processor looks like:
>> public void postGet(final ObserverContext e,
>>
Hi Jay
Am not pretty much clear on exactly what is the problem because I am not
able to find much difference. How you are checking the time taken?
When there are multiple scanner going parallely then there is a chance for
the client to be a bottle neck as it may not be able to handle so many
req
Hi Jerry,
my answer will be the same again:
Some folks will want the max versions set by the client to be before filters
and some folks will want it to restrict the end result.
It's not possible to have it both ways. Your filter needs to do the right thing.
There's a lot of discussion around th
Hi
The difference between hbase-0.94.1 and hbase-0.94.1-security.
Regards
0.94.1-security has the Security and AccessController features if you
configure these features. So the HBase Administrator can manage the table
permissions(read/write/admin) like mysql.
On Wed, Aug 29, 2012 at 11:35 AM, Everist wrote:
> Hi
>
>
>
> The difference between hbase-0.94.1 and hbase-
29 matches
Mail list logo