Our enterprise has builds for a couple of hundred services running many
times a day, on a set of Jenkins build nodes, most of which are Maven
builds.The builds use a settings.xml that doesn't override the
"localRepository", so it will use a local repository on the build node that
is used by all builds.

We often run into weird build errors like "Could not find artifact", for
artifacts that I can clearly show are available in our intranet repository.
In every one of those cases, when we inspect the directory in the local
maven repo on that build node, we see that it's "corrupted", in that
perhaps the pom file is there, but the jar file is missing.

We fix these errors by simply removing the directory where the key files
are missing, then rerun the build, which correctly downloads the artifact,
and the build succeeds.  We often have to do this on multiple build nodes,
although I don't have visibility to that (not on ops team), so I don't
really know if the same artifact gets corrupted on multiple build nodes at
the same time. I doubt that.

I wish there was some way that we could either prevent some of these
incidents, or somehow automatically fix them (by removing the corrupted
directory). The latter seems like it's in the realm of possibilities, but
it would have to be constantly scanning the entire repository, looking at
leaf directories that haven't changed in several minutes, and somehow
verifying the "sanity" of the contents from a very limited perspective.

Is there anything we can do to improve this situation?

Reply via email to