On Tue, Sep 22, 2009 at 10:10 PM, stchu wrote:
> Hi Stack and Erik,
>
> Thanks for your answers. I think the timestamp is also contain in mapfiles
> (in binary format?),
> am I right?
>
> Yes, its a serialized long.
> Hfile looks better. I will migrate my prog. to hadoop 0.20 and hbase 0.20
>
Hi Stack and Erik,
Thanks for your answers. I think the timestamp is also contain in mapfiles
(in binary format?),
am I right?
Hfile looks better. I will migrate my prog. to hadoop 0.20 and hbase 0.20
after I finished my experiments in 0.19.
But it needs some efforts for those imcompatible apis..
hi, i am looking for a better search suggestion,here is my data
id book author
1abcme
2def me
3ghi you
and here is my hbase table data
1id:1 1 book:abc abc author:me me
2id:2 2 book:def def author:me me
3id:3 3 book:ghi ghi author:you you
when i wa
(Funny, I read the 2MB as 2GB -- yeah, why so small Guy?)
On Tue, Sep 22, 2009 at 4:59 PM, Jonathan Gray wrote:
> Is there a reason you have the split size set to 2MB? That's rather small
> and you'll end up constantly splitting, even once you have good
> distribution.
>
> I'd go for pre-splitt
Is there a reason you have the split size set to 2MB? That's rather
small and you'll end up constantly splitting, even once you have good
distribution.
I'd go for pre-splitting, as others suggest, but with larger region sizes.
Ryan Rawson wrote:
An interesting thing about HBase is it really
An interesting thing about HBase is it really performs better with
more data. Pre-splitting tables is one way.
Another performance bottleneck is the write-ahead-log. You can disable
it by calling:
Put.setWriteToWAL(false);
and you will achieve a substantial speedup.
Good luck!
-ryan
On Tue, Sep
Split your table in advance? You can do it from the UI or shell (Script
it?)
Regards same performance for 10 nodes as for 5 nodes, how many regions in
your table? What happens if you pile on more data?
The split algorithm will be sped up in coming versions for sure. Two
minutes seems like a lo
Hello all,
I've been working with HBase for the past few months on a proof of
concept/technology adoption evaluation.I wanted to describe my
scenario to the user/development community to get some input on my
observations.
I've written an application that is comprised of two tables
Greetings Jon,
A quick performance snapshot: I believe with our cluster of 18 nodes (8
cores, 8 GB RAM, 2 x 500 GB drives per node), we were inserting rows of
about 5-10kb at the rate of 180,000 /second. That's on a completely untuned
cluster. You could see much better performance with proper twea
Serializing even a large list into 1 column is not a bad thing necessairly.
The thing is when you update that column, you have to rewrite the whole
thing. If you expect lots of items and a frequent update it might be better
to store each item in a column as stack says above.
Another question you c
Would a family devoted to your list -- called 'list'! -- work for you? You
could get individual members of the list by doing list:membername or get
them all by getting all elements of the family, etc.
St.Ack
On Tue, Sep 22, 2009 at 9:57 AM, Keith Thomas wrote:
>
> I have a family which contains
Hi all,
I was looking at the HBase Goes Realtime presentation yesterday and came
across these numbers:
Tall Table 1 million rows with a single column
* Insert - 0.24 ms per row
* Read - 1.42ms per row
* Full Scan - 11 seconds
Wide Table 1000 Rows with 20,000 columns
* Insert - 312 ms per row
* R
I have a family which contains an array, or list, of values. I don't mind
whether it is an array or a list or even an arraylist :)
At the moment I have gone down the quick and dirty route of serializing my
list into one column. While functionally this works sufficiently well to
allow me to keep
Yes, what Erik said. MapFile is a binary format. What you are some
preamble up front listing the key and value class types plus some
miscellaneous meta data. Then, per key and value, these are serialized
Writable types.
Move to hbase 0.20.0. It uses hfile instead of mapfile. There is a nice
l
Hey Stchu!
Not exactly sure what the "messy code" is except the it looks like non
printable binary data. Depending on
where you look I think it is values, offset etc.
The reason that we are keeping the family stored in the files are to leave
the door open for something called
locality groups. The
Hi,
I use Hadoop 0.19.1 and HBase 0.19.3.
I write a simple table which have 2 column families (Level0:trail_id,
Level1:trail_id).
And I put the data (4 rows) into hbase table:
120_25 column=Level0:trail_id,
timestamp=2009091613240001, value=3;21234
121.1_23.4
16 matches
Mail list logo