Hello,

TL;DR - juju has lots of problems with data races, please test your
code with the -race flag to ensure it doesn't get worse while we try
to fix the problem.

Longer version:

In debugging https://bugs.launchpad.net/bugs/1456398 I found that
there are multiple data races in the Juju code base. It's been long
suspected that the test's are racy, PatchValue is super easy to
introduce a race if all the workers started by a test have not exited
before the suite's tear down function runs.

However, more serious races have been discovered, such as
https://bugs.launchpad.net/bugs/1456857 which affects code all the way
back to 1.22.

Why is a data race bad ?

Ok, so you're looking at https://bugs.launchpad.net/bugs/1456857 and
you're thinking, so maybe the tls code accidentally uses the wrong
certificate for a little bit, how bad is that?

The problem is data races affect the integrity of the structures that
the garbage collector uses. In the example above replacing the
certificate means one CPU can see the new value, and another
potentially the old value. When it comes to to run the gc, depending
on which CPU walks that chain of pointers it may think that the old
certificate is still live, or the new certificate is unreachable --
and boom that memory is marked as free and the certificate corrupted.

The short version is: there are no safe data races, and Juju is not
reliable until they have all been fixed.

How to run the race detector ?

The race detector comes with Go and is available by adding the -race
flag to invocations of go test, so what was

    go test github.com/juju/juju/...

becomes

    go test -race github.com/juju/juju/...

The downside of this is the race detector has significant overhead, at
least 2x, so tests will be even slower.

Thanks

Dave

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to