I'd like to understand what's going to happen with SunSpider in the future. Here is a set of questions and criticisms. I'm interested in how these can be addressed.
There are 3 areas I'd like to see improved in SunSpider, some of which we've discussed before: #1: SunSpider is currently version 0.9. Will SunSpider ever change? Or is it static? I believe that benchmarks need to be able to move with the times. As JS Engines change and improve, and as new areas are needed to be benchmarked, we need to be able to roll the version, fix bugs, and benchmark new features. The SunSpider version has not changed for ~2yrs. How can we change this situation? Are there plans for a new version already underway? #2: Use of summing as a scoring mechanism is problematic Unfortunately, the sum-based scoring techniques do not withstand the test of time as browsers improve. When the benchmark was first introduced, each test was equally weighted and reasonably large. Over time, however, the test becomes dominated by the slowest tests - basically the weighting of the individual tests is variable based on the performance of the JS engine under test. Today's engines spend ~50% of their time on just string and date tests. The other tests are largely irrelevant at this point, and becoming less relevant every day. Eventually many of the tests will take near-zero time, and the benchmark will have to be scrapped unless we figure out a better way to score it. Benchmarking research which long pre-dates SunSpider confirms that geometric means provide a better basis for comparison: http://portal.acm.org/citation.cfm?id=5673 Can future versions of the SunSpider driver be made so that they won't become irrelevant over time? #3: The SunSpider harness has a variance problem due to CPU power savings modes. Because the test runs a tiny amount of Javascript (often under 10ms) followed by a 500ms sleep, CPUs will go into power savings modes between test runs. This radically changes the performance measurements and makes it so that comparison between two runs is dependent on the user's power savings mode. To demonstrate this, run SunSpider on two machines- one with the Windows "balanced" (default) setting for power, and then again with "high performance". It's easy to see skews of 30% between these two modes. I think we should change the test harness to avoid such accidental effects. (BTW - if you change SunSpider's sleep from 500ms to 10ms, the test runs in just a few seconds. It is unclear to me why the pauses are so large. My browser gets a 650ms score, so run 5 times, that test should take ~3000ms. But due to the pauses, it takes over 1 minute to run test, leaving the CPU ~96% idle). Possible solution: The dromaeo test suite already incorporates the SunSpider individual tests under a new benchmark harness which fixes all 3 of the above issues. Thus, one approach would be to retire SunSpider 0.9 in favor of Dromaeo. http://dromaeo.com/?sunspider Dromaeo has also done a lot of good work to ensure statistical significance of the results. Once we have a better benchmarking framework, it would be great to build a new microbenchmark mix which more realistically exercises today's JavaScript. Thanks, Mike
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev