David Cockbill created ANY23-447:
------------------------------------

             Summary: Reduce Any23 dependency bloat
                 Key: ANY23-447
                 URL: https://issues.apache.org/jira/browse/ANY23-447
             Project: Apache Any23
          Issue Type: Improvement
          Components: core
    Affects Versions: 2.3
            Reporter: David Cockbill


Compelled by email conversation with Hans Brende:
{code:java}
David, unfortunately this move won't reduce the number of core dependencies
we have: the plugins and service modules are not dependencies of the core
module. However, it might be useful if you posted an issue about the
dependency bloat, including the various exclusions you are using: we might
be able to mitigate the problem.
{code}
This was a result of having to exclude dependencies in the pom.xml for a 
product (Note that there was not too much thought in the exclusions, I was 
trying to get the code size down before a release). Section of pom.xml:
{code:java}
    <dependency>
      <groupId>org.apache.any23</groupId>
      <artifactId>apache-any23-core</artifactId>
        <exclusions>
          <!-- Any23 brings in a lot of dependencies which bloats the sharded 
jar. 
               This is an attempt to reduce this by excluding packages
               that we may not be using as part of Any23.
               NOTE: If dependency is required at runtime, then a 
               java.lang.NoClassDefFoundError is thrown.  -->
          
          <exclusion>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-parsers</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.bouncycastle</groupId>
            <artifactId>bcmail-jdk15on</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.bouncycastle</groupId>
            <artifactId>bcprov-jdk15on</artifactId>
          </exclusion>
          <exclusion>
            <groupId>edu.ucar</groupId>
            <artifactId>cdm</artifactId>
          </exclusion>
          <exclusion>
            <groupId>net.sf.trove4j</groupId>
            <artifactId>trove4j</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.cxf</groupId>
            <artifactId>cxf-rt-rs-client</artifactId>
          </exclusion>
          <exclusion>
            <groupId>com.github.ben-manes.caffeine</groupId>
            <artifactId>caffeine</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.opengis</groupId>
            <artifactId>geoapi</artifactId>
          </exclusion>  
          <exclusion>
            <groupId>com.drewnoakes</groupId>
            <artifactId>metadata-extractor</artifactId>
          </exclusion> 
          <exclusion>
            <groupId>org.eclipse.rdf4j</groupId>
            <artifactId>rdf4j-repository-sail</artifactId>
          </exclusion> 
          <exclusion>
            <groupId>org.eclipse.rdf4j</groupId>
            <artifactId>rdf4j-sail-memory</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.tukaani</groupId>
            <artifactId>xz</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.codelibs</groupId>
            <artifactId>jhighlight</artifactId>
          </exclusion> 
          <exclusion>
            <groupId>org.gagravarr</groupId>
            <artifactId>vorbis-java-core</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.gagravarr</groupId>
            <artifactId>vorbis-java-tika</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.opennlp</groupId>
            <artifactId>opennlp-tools</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox-tools</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
          </exclusion>
          <exclusion>
            <groupId>edu.ucar</groupId>
            <artifactId>grib</artifactId>
          </exclusion>  
          <exclusion>
            <groupId>com.googlecode.mp4parser</groupId>
            <artifactId>isoparser</artifactId>
          </exclusion>
          <exclusion>
            <groupId>com.healthmarketscience.jackcess</groupId>
            <artifactId>jackcess</artifactId>
          </exclusion>
          <exclusion>
            <groupId>com.healthmarketscience.jackcess</groupId>
            <artifactId>jackcess-encrypt</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.sis.core</groupId>
            <artifactId>sis-utility</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.sis.storage</groupId>
            <artifactId>sis-netcdf</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.apache.sis.core</groupId>
            <artifactId>sis-metadata</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.eclipse.rdf4j</groupId>
            <artifactId>rdf4j-rio-trix</artifactId>
          </exclusion>
          <exclusion>
            <groupId>org.yaml</groupId>
            <artifactId>snakeyaml</artifactId>
          </exclusion>        
          <exclusion>
            <groupId>org.eclipse.rdf4j</groupId>
            <artifactId>rdf4j-rio-turtle</artifactId>
          </exclusion>         
        </exclusions>
    </dependency>
{code}
Some background that may be useful from my notes:
{code:java}
Whilst adding Any23 the product, the Any23 Core package was causing Lintian to 
fail.

Lintian is a Debian package checker written in PERL. This package uses 
Archive::Zip to unpack any .jar file in the Debian package. This particular 
unzip utility does not handle the Zip64 format; causing the failure. The 
original zip format has various restrictions, one of which being the number of 
files in the archive. Therefore if the class files in the jar for the product 
exceeds this limit (65535), then a zip64 format file is produced instead of a 
standard zip file.

The Any23 Core Library does seem quite excessive in what it pulls in. From 
running the following, the output for the product goes from 40490 to 78513.

zipinfo -1 product.jar | wc -l
{code}

This Linitan failure on a linux build was the original push for the exclusions; 
however the product .jar also increased in a similar fashion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to