Joe McDonnell created IMPALA-10455:
--------------------------------------

             Summary: Reorder Maven repositories to have cleaner mirror 
semantics
                 Key: IMPALA-10455
                 URL: https://issues.apache.org/jira/browse/IMPALA-10455
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend, Infrastructure
    Affects Versions: Impala 4.0
            Reporter: Joe McDonnell


Using a Maven mirror to replace Maven Central can speed up the Impala build 
substantially. However, the artifacts that are present in the toolchain s3 
bucket are unlikely to be able to resolved by the mirror, because they are not 
in Maven Central or other repositories. If the Maven mirror has a long list of 
source repositories, a miss can be expensive, because it may try each of the 
mirror's source repositories. It would be useful to exclude the s3 bucket Maven 
repositories from the mirroring. For example, this settings.xml would do that:
{noformat}
<settings>
  <mirrors>
    <mirror>
      <mirrorOf>external:*,!impala.cdp.repo</mirrorOf>
      <name>mirror-repo</name>
      <url>http://url.to.the.mirror/</url>
      <id>mirror-repo</id>
    </mirror>
  </mirrors>
</settings>{noformat}
It mirrors everything that is not local and not from impala.cdp.repo (which 
points to an S3 bucket).

Unfortunately, this rule doesn't work. Everything still tries the mirror. Maven 
is trying repositories in the order that they are specified in the pom.xml, and 
it sees cdh.rcs.releases.repo before it sees impala.cdp.repo ( 
[https://github.com/apache/impala/blob/master/java/pom.xml#L150 
).|https://github.com/apache/impala/blob/master/java/pom.xml#L150)] It also 
sees multiple banned repos (i.e. repos where both snapshots and releases are 
disabled). Based on my testing, seeing the cdh.rcs.releases.repo causes it to 
try the mirror, because it matches the mirrorOf conditions. It seems like the 
banned repositories may also a problem, depending on how smart Maven is.

Reordering the repositories can fix these semantics. If the impala.cdp.repo 
comes first (along with the impala.toolchain.kudu.repo), then anything that 
matches that would avoid hitting the mirror. Specifically, it seems like the 
best ordering would be impala.toolchain.kudu.repo (a local filesystem repo), 
impala.cdp.repo (an s3 repo), then the normal server repos, and lastly the 
banned repositories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to