On Jul 4, 2009, at 1:06 PM, Peter Kasting wrote:

On Sat, Jul 4, 2009 at 11:47 AM, Mike Belshe <m...@belshe.com> wrote:
#3: The SunSpider harness has a variance problem due to CPU power savings modes.

This one worries me because it decreases the consistency/ reproducibility of test scores and makes it harder to compare engines or to track one engine's scores over time. For example, doing a bunch of CPU work just before running the benchmark can affect whether and when the CPU throttles down during the benchmark run.

Possible solution:
The dromaeo test suite already incorporates the SunSpider individual tests under a new benchmark harness which fixes all 3 of the above issues. Thus, one approach would be to retire SunSpider 0.9 in favor of Dromaeo. http://dromaeo.com/?sunspider Dromaeo has also done a lot of good work to ensure statistical significance of the results. Once we have a better benchmarking framework, it would be great to build a new microbenchmark mix which more realistically exercises today's JavaScript.

One complaint I have heard about the Dromaeo tests (not the harness) is that the actual JS that gets run differs from browser to browser (e.g. because it is a direct copy of a source library that does UA sniffing). If this is true it means that this suite as-is isn't useful to compare engines to each other.

However, the Dromaeo _harness_ is probably a win as-is.

Of course, changing anything about Sunspider raises the question of tracking historical performance. Perhaps the harness could support versioning, or perhaps people are simply willing to say "Sunspider 1.0 scores cannot be compared to Sunspider 0.9 scores". I believe this is the approach the V8 benchmark takes.

I think versioning the test content is right, and I think we should do that over time. I think a harness change to avoid triggering powersaving mode on Windows would be a reasonable thing to do to the harness without a version change. I don't think Dromaeo is a good choice of harness - I don't think their results are stable enough and I am not confident in the statistical soundness of their methodology.

Regards,
Maciej

_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Reply via email to