On Jul 4, 2009, at 1:06 PM, Peter Kasting wrote:
On Sat, Jul 4, 2009 at 11:47 AM, Mike Belshe <m...@belshe.com> wrote:
#3: The SunSpider harness has a variance problem due to CPU power
savings modes.
This one worries me because it decreases the consistency/
reproducibility of test scores and makes it harder to compare
engines or to track one engine's scores over time. For example,
doing a bunch of CPU work just before running the benchmark can
affect whether and when the CPU throttles down during the benchmark
run.
Possible solution:
The dromaeo test suite already incorporates the SunSpider individual
tests under a new benchmark harness which fixes all 3 of the above
issues. Thus, one approach would be to retire SunSpider 0.9 in
favor of Dromaeo. http://dromaeo.com/?sunspider Dromaeo has also
done a lot of good work to ensure statistical significance of the
results. Once we have a better benchmarking framework, it would be
great to build a new microbenchmark mix which more realistically
exercises today's JavaScript.
One complaint I have heard about the Dromaeo tests (not the harness)
is that the actual JS that gets run differs from browser to browser
(e.g. because it is a direct copy of a source library that does UA
sniffing). If this is true it means that this suite as-is isn't
useful to compare engines to each other.
However, the Dromaeo _harness_ is probably a win as-is.
Of course, changing anything about Sunspider raises the question of
tracking historical performance. Perhaps the harness could support
versioning, or perhaps people are simply willing to say "Sunspider
1.0 scores cannot be compared to Sunspider 0.9 scores". I believe
this is the approach the V8 benchmark takes.
I think versioning the test content is right, and I think we should do
that over time. I think a harness change to avoid triggering
powersaving mode on Windows would be a reasonable thing to do to the
harness without a version change. I don't think Dromaeo is a good
choice of harness - I don't think their results are stable enough and
I am not confident in the statistical soundness of their methodology.
Regards,
Maciej
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev