Hi Mike, This project contains some small synthetic benchmarks: https://github.com/amplab/spark-perf. Otherwise, for ML algorithms, look in mllib -- it comes with driver programs for K-means, logistic regression, matrix factorization, etc, as well as data generators for them.
Matei On Aug 23, 2013, at 5:12 PM, Mike <[email protected]> wrote: > I'm looking to put together some representative tests for Spark. Where > can I find such data and code? There must be some already existing. > Some tests (logistic regression, k-means, PageRank) are mentioned in the > RDD paper, for example.
