Run the Big Data Benchmark for new releases

2014-09-01 Thread Nicholas Chammas
What do people think of running the Big Data Benchmark https://amplab.cs.berkeley.edu/benchmark/ (repo https://github.com/amplab/benchmark) as part of preparing every new release of Spark? We'd run it just for Spark and effectively use it as another type of test to track any performance progress

Re: Run the Big Data Benchmark for new releases

2014-09-01 Thread Matei Zaharia
Hi Nicholas, At Databricks we already run https://github.com/databricks/spark-perf for each release, which is a more comprehensive performance test suite. Matei On September 1, 2014 at 8:22:05 PM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: What do people think of running the Big

Re: Run the Big Data Benchmark for new releases

2014-09-01 Thread Nicholas Chammas
Oh, that's sweet. So, a related question then. Did those tests pick up the performance issue reported in SPARK- https://issues.apache.org/jira/browse/SPARK-? Does it make sense to add a new test to cover that case? On Tue, Sep 2, 2014 at 12:29 AM, Matei Zaharia matei.zaha...@gmail.com

Re: Run the Big Data Benchmark for new releases

2014-09-01 Thread Patrick Wendell
Yeah, this wasn't detected in our performance tests. We even have a test in PySpark that I would have though might catch this (it just schedules a bunch of really small tasks, similar to the regression case). https://github.com/databricks/spark-perf/blob/master/pyspark-tests/tests.py#L51

Re: Run the Big Data Benchmark for new releases

2014-09-01 Thread Nicholas Chammas
Alright, sounds good! I've created databricks/spark-perf/issues/9 https://github.com/databricks/spark-perf/issues/9 as a reminder for us to add a new test once we've root caused SPARK-. On Tue, Sep 2, 2014 at 1:07 AM, Patrick Wendell pwend...@gmail.com wrote: Yeah, this wasn't detected in