Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-21 Thread Steve Loughran
Andrew Newman wrote: They are comparing an indexed system with one that isn't. Why is Hadoop faster at loading than the others? Surely no one would be surprised that it would be slower - I'm surprised at how well Hadoop does. Who want to write a paper for next year, "grep vs reverse index"? 2

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-17 Thread Bradford Stephens
There's definitely a false dichotomy to this paper, and I think it's a tad disingenuous. It's titled "A Comparison Of Approaches To Large Scale Data Analysis", when it should be titled "A Comparison of Parallel RDBMSs to MapReduce for RDBMS-specific problems". There's little surprise that the peopl

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-15 Thread Jonathan Gray
I agree with you, Andy. This seems to be a great look into what Hadoop MapReduce is not good at. Over in the HBase world, we constantly deal with comparisons like this to RDBMSs, trying to determine if one is better than the other. It's a false choice and completely depends on the use case. Had

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-15 Thread Andy Liu
Not sure if comparing Hadoop to databases is an apples to apples comparison. Hadoop is a complete job execution framework, which collocates the data with the computation. I suppose DBMS-X and Vertica do that to some certain extent, by way of SQL, but you're restricted to that. If you want to say

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-14 Thread Tarandeep Singh
I think there is one important comparison missing in the paper- cost. The paper does mention that Hadoop comes for free in the conclusion, but didn't give any details of how much it would cost to get license for Vertica or DBMS X to run on 100 nodes. Further, with data warehouse products like Hive

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-14 Thread Andrew Newman
They are comparing an indexed system with one that isn't. Why is Hadoop faster at loading than the others? Surely no one would be surprised that it would be slower - I'm surprised at how well Hadoop does. Who want to write a paper for next year, "grep vs reverse index"? 2009/4/15 Guilherme Germ

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-14 Thread Brian Bockelman
On Apr 14, 2009, at 12:47 PM, Guilherme Germoglio wrote: Hi Brian, I'm sorry but it is not my paper. :-) I've posted the link here because we're always looking for comparison data -- so, I thought this benchmark would be welcome. Ah, sorry, I guess I was being dense when looking at the

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-14 Thread Guilherme Germoglio
Hi Brian, I'm sorry but it is not my paper. :-) I've posted the link here because we're always looking for comparison data -- so, I thought this benchmark would be welcome. Also, I won't attend the conference. However, it would be a good idea to someone who will to ask directly to the authors all

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-14 Thread Brian Bockelman
Hey Guilherme, It's good to see comparisons, especially as it helps folks understand better what tool is the best for their problem. As you show in your paper, a MapReduce system is hideously bad in performing tasks that column-store databases were designed for (selecting a single value

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-14 Thread Bryan Duxbury
I thought it a conspicuous omission to not discuss the cost of various approaches. Hadoop is free, though you have to spend developer time; how much does Vertica cost on 100 nodes? -Bryan On Apr 14, 2009, at 7:16 AM, Guilherme Germoglio wrote: (Hadoop is used in the benchmarks) http://dat

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-14 Thread tim robertson
Thanks for sharing this - I find these comparisons really interesting. I have a small comment after skimming this very quickly. [Please accept my apologies for commenting on such a trivial thing, but personal experience has shown this really influences performance] One thing not touched on in the

fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-14 Thread Guilherme Germoglio
(Hadoop is used in the benchmarks) http://database.cs.brown.edu/sigmod09/ There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBM