David Cockbill created ANY23-447: ------------------------------------ Summary: Reduce Any23 dependency bloat Key: ANY23-447 URL: https://issues.apache.org/jira/browse/ANY23-447 Project: Apache Any23 Issue Type: Improvement Components: core Affects Versions: 2.3 Reporter: David Cockbill
Compelled by email conversation with Hans Brende: {code:java} David, unfortunately this move won't reduce the number of core dependencies we have: the plugins and service modules are not dependencies of the core module. However, it might be useful if you posted an issue about the dependency bloat, including the various exclusions you are using: we might be able to mitigate the problem. {code} This was a result of having to exclude dependencies in the pom.xml for a product (Note that there was not too much thought in the exclusions, I was trying to get the code size down before a release). Section of pom.xml: {code:java} <dependency> <groupId>org.apache.any23</groupId> <artifactId>apache-any23-core</artifactId> <exclusions> <!-- Any23 brings in a lot of dependencies which bloats the sharded jar. This is an attempt to reduce this by excluding packages that we may not be using as part of Any23. NOTE: If dependency is required at runtime, then a java.lang.NoClassDefFoundError is thrown. --> <exclusion> <groupId>org.apache.tika</groupId> <artifactId>tika-parsers</artifactId> </exclusion> <exclusion> <groupId>org.bouncycastle</groupId> <artifactId>bcmail-jdk15on</artifactId> </exclusion> <exclusion> <groupId>org.bouncycastle</groupId> <artifactId>bcprov-jdk15on</artifactId> </exclusion> <exclusion> <groupId>edu.ucar</groupId> <artifactId>cdm</artifactId> </exclusion> <exclusion> <groupId>net.sf.trove4j</groupId> <artifactId>trove4j</artifactId> </exclusion> <exclusion> <groupId>org.apache.cxf</groupId> <artifactId>cxf-rt-rs-client</artifactId> </exclusion> <exclusion> <groupId>com.github.ben-manes.caffeine</groupId> <artifactId>caffeine</artifactId> </exclusion> <exclusion> <groupId>org.opengis</groupId> <artifactId>geoapi</artifactId> </exclusion> <exclusion> <groupId>com.drewnoakes</groupId> <artifactId>metadata-extractor</artifactId> </exclusion> <exclusion> <groupId>org.eclipse.rdf4j</groupId> <artifactId>rdf4j-repository-sail</artifactId> </exclusion> <exclusion> <groupId>org.eclipse.rdf4j</groupId> <artifactId>rdf4j-sail-memory</artifactId> </exclusion> <exclusion> <groupId>org.tukaani</groupId> <artifactId>xz</artifactId> </exclusion> <exclusion> <groupId>org.codelibs</groupId> <artifactId>jhighlight</artifactId> </exclusion> <exclusion> <groupId>org.gagravarr</groupId> <artifactId>vorbis-java-core</artifactId> </exclusion> <exclusion> <groupId>org.gagravarr</groupId> <artifactId>vorbis-java-tika</artifactId> </exclusion> <exclusion> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> </exclusion> <exclusion> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> </exclusion> <exclusion> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox-tools</artifactId> </exclusion> <exclusion> <groupId>org.apache.poi</groupId> <artifactId>poi-scratchpad</artifactId> </exclusion> <exclusion> <groupId>edu.ucar</groupId> <artifactId>grib</artifactId> </exclusion> <exclusion> <groupId>com.googlecode.mp4parser</groupId> <artifactId>isoparser</artifactId> </exclusion> <exclusion> <groupId>com.healthmarketscience.jackcess</groupId> <artifactId>jackcess</artifactId> </exclusion> <exclusion> <groupId>com.healthmarketscience.jackcess</groupId> <artifactId>jackcess-encrypt</artifactId> </exclusion> <exclusion> <groupId>org.apache.sis.core</groupId> <artifactId>sis-utility</artifactId> </exclusion> <exclusion> <groupId>org.apache.sis.storage</groupId> <artifactId>sis-netcdf</artifactId> </exclusion> <exclusion> <groupId>org.apache.sis.core</groupId> <artifactId>sis-metadata</artifactId> </exclusion> <exclusion> <groupId>org.eclipse.rdf4j</groupId> <artifactId>rdf4j-rio-trix</artifactId> </exclusion> <exclusion> <groupId>org.yaml</groupId> <artifactId>snakeyaml</artifactId> </exclusion> <exclusion> <groupId>org.eclipse.rdf4j</groupId> <artifactId>rdf4j-rio-turtle</artifactId> </exclusion> </exclusions> </dependency> {code} Some background that may be useful from my notes: {code:java} Whilst adding Any23 the product, the Any23 Core package was causing Lintian to fail. Lintian is a Debian package checker written in PERL. This package uses Archive::Zip to unpack any .jar file in the Debian package. This particular unzip utility does not handle the Zip64 format; causing the failure. The original zip format has various restrictions, one of which being the number of files in the archive. Therefore if the class files in the jar for the product exceeds this limit (65535), then a zip64 format file is produced instead of a standard zip file. The Any23 Core Library does seem quite excessive in what it pulls in. From running the following, the output for the product goes from 40490 to 78513. zipinfo -1 product.jar | wc -l {code} This Linitan failure on a linux build was the original push for the exclusions; however the product .jar also increased in a similar fashion. -- This message was sent by Atlassian Jira (v8.3.4#803005)