On Mon, Jun 12, 2017 at 11:55 PM, Michael Gottesman <mgottes...@apple.com> wrote:
> > The current design assumes that in such cases, the workload will be > increased so that is not an issue. > I understand. But clearly some part of our process is failing, because there are multiple benchmarks in 10ms range in the tree for months without fixing this. > The reason why we use the min is that statistically we are not interesting > in estimated the "mean" or "center" of the distribution. Rather, we are > actually interested in the "speed of light" of the computation implying > that we are looking for the min. > I understand that. But all measurements have a certain degree of error associated with them. Our issue is two-fold: we need to differentiate between normal variation between measured samples under "perfect" conditions and samples that are worse because of interference from other background processes. > What do you mean by anomalous results? > I mean results that significantly stand out from the measured sample population. Currently I'm working on improved sample filtering algorithm. Stay tuned > for demonstration in Benchmark_Driver (Python), if it pans out, it might be > time to change adaptive sampling in DriverUtil.swift. > > > Have you looked at using the Mann-Whitney U algorithm? (I am not sure if > we are using it or not) > I don't know what that is. Here's what I've been doing: Depending on the "weather" on the test machine, you sometimes measure anomalies. So I'm tracking the coefficient of variance from the sample population and purging anomalous results (1 sigma from max) when it exceeds 5%. This results in quite solid sample population where standard deviation is a meaningful value, that can be use in judging the significance of change between master and branch. --Pavol
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev