> On Jun 12, 2017, at 4:54 PM, Pavol Vaskovic <p...@pali.sk> wrote: > > > > On Mon, Jun 12, 2017 at 11:55 PM, Michael Gottesman <mgottes...@apple.com > <mailto:mgottes...@apple.com>> wrote: > > The current design assumes that in such cases, the workload will be increased > so that is not an issue. > > I understand. But clearly some part of our process is failing, because there > are multiple benchmarks in 10ms range in the tree for months without fixing > this.
I think that is just inertia and being busy. Patch? I'll review = ). > > The reason why we use the min is that statistically we are not interesting in > estimated the "mean" or "center" of the distribution. Rather, we are actually > interested in the "speed of light" of the computation implying that we are > looking for the min. > > I understand that. But all measurements have a certain degree of error > associated with them. Our issue is two-fold: we need to differentiate between > normal variation between measured samples under "perfect" conditions and > samples that are worse because of interference from other background > processes. I disagree. CPUs are inherently messy but disruptions tend to be due to temporary spikes most of the time once you have quieted down your system by unloading a few processes. > > What do you mean by anomalous results? > > I mean results that significantly stand out from the measured sample > population. What that could mean is that we need to run a couple of extra iterations to warm up the cpu/cache/etc before we start gathering samples. > >> Currently I'm working on improved sample filtering algorithm. Stay tuned for >> demonstration in Benchmark_Driver (Python), if it pans out, it might be time >> to change adaptive sampling in DriverUtil.swift. > > Have you looked at using the Mann-Whitney U algorithm? (I am not sure if we > are using it or not) > > I don't know what that is. Check it out: https://en.wikipedia.org/wiki/Mann–Whitney_U_test <https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test>. It is a non-parametric test that two sets of samples are from the same distribution. As a bonus, it does not assume that our data is from a normal distribution (a problem with using mean/standard deviation which assumes a normal distribution). We have been using Mann-Whitney internally for a while successfully to reduce the noise. > Here's what I've been doing: > > Depending on the "weather" on the test machine, you sometimes measure > anomalies. So I'm tracking the coefficient of variance from the sample > population and purging anomalous results (1 sigma from max) when it exceeds > 5%. This results in quite solid sample population where standard deviation is > a meaningful value, that can be use in judging the significance of change > between master and branch. > > --Pavol
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev