Andrew Newman wrote:
They are comparing an indexed system with one that isn't. Why is
Hadoop faster at loading than the others? Surely no one would be
surprised that it would be slower - I'm surprised at how well Hadoop
does. Who want to write a paper for next year, "grep vs reverse
index"?
2
There's definitely a false dichotomy to this paper, and I think it's a
tad disingenuous. It's titled "A Comparison Of Approaches To Large
Scale Data Analysis", when it should be titled "A Comparison of
Parallel RDBMSs to MapReduce for RDBMS-specific problems". There's
little surprise that the peopl
I agree with you, Andy.
This seems to be a great look into what Hadoop MapReduce is not good at.
Over in the HBase world, we constantly deal with comparisons like this to
RDBMSs, trying to determine if one is better than the other. It's a false
choice and completely depends on the use case.
Had
Not sure if comparing Hadoop to databases is an apples to apples
comparison. Hadoop is a complete job execution framework, which collocates
the data with the computation. I suppose DBMS-X and Vertica do that to some
certain extent, by way of SQL, but you're restricted to that. If you want
to say
I think there is one important comparison missing in the paper- cost. The
paper does mention that Hadoop comes for free in the conclusion, but didn't
give any details of how much it would cost to get license for Vertica or
DBMS X to run on 100 nodes.
Further, with data warehouse products like Hive
They are comparing an indexed system with one that isn't. Why is
Hadoop faster at loading than the others? Surely no one would be
surprised that it would be slower - I'm surprised at how well Hadoop
does. Who want to write a paper for next year, "grep vs reverse
index"?
2009/4/15 Guilherme Germ
On Apr 14, 2009, at 12:47 PM, Guilherme Germoglio wrote:
Hi Brian,
I'm sorry but it is not my paper. :-) I've posted the link here
because
we're always looking for comparison data -- so, I thought this
benchmark
would be welcome.
Ah, sorry, I guess I was being dense when looking at the
Hi Brian,
I'm sorry but it is not my paper. :-) I've posted the link here because
we're always looking for comparison data -- so, I thought this benchmark
would be welcome.
Also, I won't attend the conference. However, it would be a good idea to
someone who will to ask directly to the authors all
Hey Guilherme,
It's good to see comparisons, especially as it helps folks understand
better what tool is the best for their problem. As you show in your
paper, a MapReduce system is hideously bad in performing tasks that
column-store databases were designed for (selecting a single value
I thought it a conspicuous omission to not discuss the cost of
various approaches. Hadoop is free, though you have to spend
developer time; how much does Vertica cost on 100 nodes?
-Bryan
On Apr 14, 2009, at 7:16 AM, Guilherme Germoglio wrote:
(Hadoop is used in the benchmarks)
http://dat
Thanks for sharing this - I find these comparisons really interesting.
I have a small comment after skimming this very quickly.
[Please accept my apologies for commenting on such a trivial thing,
but personal experience has shown this really influences performance]
One thing not touched on in the
(Hadoop is used in the benchmarks)
http://database.cs.brown.edu/sigmod09/
There is currently considerable enthusiasm around the MapReduce
(MR) paradigm for large-scale data analysis [17]. Although the
basic control flow of this framework has existed in parallel SQL
database management systems (DBM
12 matches
Mail list logo