What you seem to think is better would be to repeatedly update
sunspider everytime that something gets faster, ignoring entirely
that the value in sunspider is precisely that it has not changed.
Not quite what I'm saying :-)
I'd like benchmarks to:
a) have meaning even as browsers change over time
b) evolve. as new areas of JS (or whatever) become important,
the benchmark should have facilities to include that.
Fair? Good? Bad?
It's not unreasonable, but it can't be done on a whim, and changes
cannot be made trivially. Both re-weighting sunspider and adding new
tests as things are made faster is incredibly hard to do soundly
because it becomes easy to end up obscuring meaningful data.
In the context of regex for example, say sunspider had been reweighted
for the current generation on js engines before anyone had looked at
regex. Regex would not have stood out as being substantially slower,
and would likely not have been investigated resulting in everyone
having regex an order of magnitude slower than current engines.
That's why sunspider has not been updated: after what a year and a
half (?) it can still show areas where performance can be improved and
while it does that it's still useful.
So determining when it is sensible to update sunspider is difficult,
you may be right, and find rebalancing shows new areas where
performance can be improved, but if you're wrong you run the risk of
changing the benchmark from something that is actually useful
development tool into something that is only useful for producing a
number at the end.
If we see one section of the test taking dramatically longer than
another then we can assume that we have not been paying enough
attention to performance in that area, this is how we orginally
noticed just how slow the regex engine was. If we had been
continually rebalancing the test over and over again we would not
have noticed this or other areas where performance could be (and
has) improved. It would also break sunspider as a means for
tracking and/or preventing performance regressions.
Of course, using old versions of the benchmark for regression
testing is not prohibited by iterating a benchmark.
But what happens when the benchmarks disagree as to what is the
improvement? You can't improve performance with one benchmark while
testing for regressions with another.
--Oliver
_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev