> On Jun 12, 2017, at 10:36 PM, Pavol Vaskovic <p...@pali.sk> wrote:
> 
> As the next two paragraphs after the part you quoted go on explaining, I'm 
> hoping that with this approach we could adaptively sample the benchmark until 
> we get stable population, but starting from lower iteration count. 
> 
> If the Python implementation bears this out, the proper solution would be to 
> change the implementation in DriverUtil.swift, from the current ~1s run 
> adaptive num-iters to more finer grained runs. We'd be gathering more smaller 
> samples, tossing out anomalies as we go until we gather stable sample 
> population (with low coefficient of variation) or run out of the allotted 
> time.

~1s might be longer than necessary for the benchmarks with cheap setup. Another 
option is for the benchmark to call back to the Driver’s “start button” after 
setup. With no setup work, I think 200 ms is a bare minimum if we care about 
changes in the 1% range.

I’m confused though because I thought we agreed that all samples need to run 
with exactly the same number of iterations. So, there would be one short run to 
find the desired num_iters for each benchmark, then each subsequent invocation 
of the benchmark harness would be handed num_iters as input.

-Andy

> This has a potential to speed up the benchmark suite with more intelligent 
> management of the measurements, instead of using brute force of super-long 
> runtime to drown out the errors as we do currently. 
> 
> (I am aware of various aspects this approach might introduce that have the 
> potential to mess with the caching: time measurement itself, more frequent 
> logging - this would currently rely on --verbose mode, invoking Benchmark_O 
> from Python…)
> 
> The proof is in the pudding, so I guess we'll learn if this approach would 
> work this week, when I hammer the implementation down in Python for 
> demonstration. 
> 
> --Pavol
> 
> On Tue, 13 Jun 2017 at 03:19, Andrew Trick <atr...@apple.com 
> <mailto:atr...@apple.com>> wrote:
> 
>> On Jun 12, 2017, at 4:45 PM, Pavol Vaskovic <p...@pali.sk 
>> <mailto:p...@pali.sk>> wrote:
>> 
>> I have sketched an algorithm for getting more consistent test results, so 
>> far its in Numbers. I have ran the whole test suite for 100 samples and 
>> observed the varying distribution of test results. The first result is quite 
>> often an outlier, with subsequent results being quicker. Depending on the 
>> "weather" on the test machine, you sometimes measure anomalies. So I'm 
>> tracking the coefficient of variance from the sample population and purging 
>> anomalous results when it exceeds 5%. This results in solid sample 
>> population where standard deviation is a meaningful value, that can be use 
>> in judging the significance of change between master and branch.
> 
> That’s a reasonable approach for running 100 samples. I’m not sure how it 
> fits with the goal of minimizing turnaround time. Typically you don’t need 
> more than 3 samples (keeping in mind were usually averaging over thousands of 
> iterations per sample).
> 
> -Andy

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to