Thank you very much for your effort!
So it really depends on what you want to use it for. If you're
thinking about it, you probably have some kind of scale issues.
Not at the moment. Actually our software runs on a single server, web
server/database/file storage/lucene side by side. But we're
Hi,
Planet-scale data explorations and data mining operations will almost
always need to include some sequential scans. Then, How can we speed
up sequential scans? BigTable paper shows that.
* Column-oriented storage (it reduces I/O)
* Data compression
* PDP (parallel distributed processing)
Discussion inline.
You example with the friends makes perfectly sense. Can you imagine a
scenario where storing the data in column oriented instead of row
oriented db (so if you will an counterexample) causes such a huge
performance mismatch, like the friends one in row/column comparison?
Thank you, but i still don't got it.
I've read tons of websites and papers, but there's no clear und founded
answer why use BigTable instead of relational databases.
MySQL Cluster seams to offer the same scalabilty and level of
abstraction, whithout switching to a non relational pardigm.
I'm no expert, but maybe I can explain it the way I see it, maybe it
will resonate with other newbies like me :) Sorry if it's long winded,
or boring for those who already know all this.
BigTable and Hadoop are inherently sharded and distributed. They are
architected to store the data in
A few very big differences...
- HBase/BigTable don't have transactions in the same way that a relational
database does. While it is possible (and was just recently implemented for
HBase, see HBASE-669) it is not at the core of this design. A major bottleneck
of distributed multi-master
Thanks a lot for all replies, this is really helpful.
As you describe it, its a problem of implementation. BigTable is
designed to scale, there are routines to shard the data, desitribute it
to the pool of connected servers. Could MySQL perhaps decide tomorrow to
implement something similar
On Tue, Aug 19, 2008 at 9:44 AM, Mork0075 [EMAIL PROTECTED] wrote:
Can you please explain, why someone should use HBase for horizontal
scaling instead of a relational database? One reason for me would be,
that i don't have to implement the sharding logic myself. Are there other?
A slight
Stuart,
In general you will get a quicker response to HBase questions by posting them
to the HBase mailing list ([EMAIL PROTECTED]) see
http://hadoop.apache.org/hbase/mailing_lists.html for how to subscribe.
Perhaps the best document on scaling HBase is actually the Bigtable paper:
Thanks, this was really informativ :)
Bigtable uses both. First it splits row ranges based on size. It also has the
ability to detect hot row ranges and will split a region if it becomes too hot.
This is tricky because you don't want to have a hot range split off and then
have it drop below
I've read some papers and tutorials this week and now got some conrete
questions:
(1) Sharding is also available in common relational systems. Often it is
discribed that you need an application layer for the (shards)
federation. I unterstand HBase like this layer, which implements the
whole
Please note that you will get a prompt response about HBase questions if you
ask them on the HBase user list ( [EMAIL PROTECTED] )
-Original Message-
From: Mork0075 [mailto:[EMAIL PROTECTED]
Sent: Sunday, August 17, 2008 11:51 PM
To: core-user@hadoop.apache.org
Subject: Re: Why is
Hello,
can someone please explain oder point me to some documentation or
papers, where i can read well proven facts, why scaling a relational db
is so hard and scaling a document oriented db isnt?
So perhaps if i got lots of requests to my relational db, i would
duplicate it to several
Mork0075 wrote:
Hello,
can someone please explain oder point me to some documentation or
papers, where i can read well proven facts, why scaling a relational db
is so hard and scaling a document oriented db isnt?
http://labs.google.com/papers/bigtable.html
relational dbs are great for
14 matches
Mail list logo