Hi,

I was looking at why our CI missed the obvious problem on the
DirectVmProducerBlockingTest causing the core tests to hang.

I found 2 problems contributing to this which I would like to comment about
and bring to our attention.

Looking at the last known good build [1] we can see that there were 6020
tests executed.

"[WARNING] Tests run: 6020, Failures: 0, Errors: 0, Skipped: 22, Flakes: 18"

Then, looking at the first bad [2] build, we can see that there were only
1592 tests executed.

[ERROR] Tests run: 1592, Failures: 1, Errors: 0, Skipped: 7

[INFO]

[ERROR] There was a timeout in the fork

So, investigating further, it seems to me that the reasons for the failure
on Upstream CI were:

1.  There was no timeout for the test case. This caused it to block the
test execution indefinitely until the (Surefire plugin?) fork timed out
without causing a failure.
2. The fork timeout caused only a small number of tests to be executed.
This resulted in a smaller test coverage. The problem was that the smaller
coverage, compared to the previous execution, did not trigger a build
failure.

There are 2 possible corrective actions that I can think of at this moment:
1. Add the Timeout annotation for the tests that are likely to block.
2. Ensure that the build fails if the test coverage decreases compared to
the previous execution.

Action #1 is very simple and I will send a PR including it to that test (if
you know others that can block, I'd recommend to include it or drop me a
note/ticket).

Action #2 is not difficult, but may impact the build or release. It would
probably be better to have it in a separate profile. It would be good to
hear some feedback about this idea, though.

1.
https://ci-builds.apache.org/job/Camel/job/Camel%20JDK11/job/main/504/consoleFull
2.
https://ci-builds.apache.org/job/Camel/job/Camel%20JDK11/job/main/505/consoleFull

Kind regards
-- 
Otavio R. Piske
http://orpiske.net

Reply via email to