What do people think of running the Big Data Benchmark
https://amplab.cs.berkeley.edu/benchmark/ (repo
https://github.com/amplab/benchmark) as part of preparing every new
release of Spark?
We'd run it just for Spark and effectively use it as another type of test
to track any performance progress
Hi Nicholas,
At Databricks we already run https://github.com/databricks/spark-perf for each
release, which is a more comprehensive performance test suite.
Matei
On September 1, 2014 at 8:22:05 PM, Nicholas Chammas
(nicholas.cham...@gmail.com) wrote:
What do people think of running the Big
Oh, that's sweet. So, a related question then.
Did those tests pick up the performance issue reported in SPARK-
https://issues.apache.org/jira/browse/SPARK-? Does it make sense to
add a new test to cover that case?
On Tue, Sep 2, 2014 at 12:29 AM, Matei Zaharia matei.zaha...@gmail.com
Yeah, this wasn't detected in our performance tests. We even have a
test in PySpark that I would have though might catch this (it just
schedules a bunch of really small tasks, similar to the regression
case).
https://github.com/databricks/spark-perf/blob/master/pyspark-tests/tests.py#L51
Alright, sounds good! I've created databricks/spark-perf/issues/9
https://github.com/databricks/spark-perf/issues/9 as a reminder for us to
add a new test once we've root caused SPARK-.
On Tue, Sep 2, 2014 at 1:07 AM, Patrick Wendell pwend...@gmail.com wrote:
Yeah, this wasn't detected in