Pull request created - see https://github.com/apache/commons-compress/pull/500
Simple change to check the checksum on entries whilst looping over ones considered to be directories. Added 2 more tests that real tar archives ARE identified and that one utf-16 text file is NOT. Gren Elliot Senior Software Engineer m: +44 7590 571125 p: w: https://www.mimecast.com Address: https://www.mimecast.com/company/contact/ Work Protected.™ On 2024/03/19 12:18:11 Gary Gregory wrote: > Hello Gren, > > Feel free to provide a PR on GitHub (with a unit test) so we can see > clearly what you suggest. > > TY! > Gary > > On Tue, Mar 19, 2024, 7:46 AM Gren Elliot <ge...@mimecast.com.invalid> > wrote: > > > Hi, > > > > > > > > I’m finding that commons-compress-1.26.1 is recognising a utf-16 text file > > as a tar archive – unlike the previous version > > > > > > > > This is the code that changed in that release in ArchiveStreamFactory - > > *public > > static String detect(final InputStream in) throws ArchiveException {* > > > > that differs in detection: > > > > > > > > if (signatureLength >= *TAR_HEADER_SIZE*) { > > try (TarArchiveInputStream inputStream = new TarArchiveInputStream(new > > ByteArrayInputStream(tarHeader))) { > > > > > > *// COMPRESS-191 - verify the header checksum // COMPRESS-644 - do > > not allow zero byte file entries *TarArchiveEntry entry = > > inputStream.getNextEntry(); > > > > *// try to find the first non-directory entry within the first 10 entries. > > *int count = 0; > > while (entry != null && entry.isDirectory() && count++ < > > *TAR_TEST_ENTRY_COUNT*) { > > entry = inputStream.getNextEntry(); > > } > > if (entry != null && entry.isCheckSumOK() && !entry.isDirectory() > > && entry.getSize() > 0 || count > 0) { > > return *TAR*; > > } > > } catch (final Exception e) { > > > > *// NOPMD NOSONAR // can generate IllegalArgumentException as well > > as IOException auto-detection, simply not a TAR ignored *} > > } > > > > > > > > I feel this is being too lenient. For instance at the last “if” > > statement, for the test file, entry is null and count=1. The code suggests > > it is looking for the first non-directory entry. It hasn’t found a > > non-directory entry in our case. > > > > > > > > For instance, the earlier code at least checked that the checksum was OK > > for the one entry it checked (it isn’t for our test file…) > > > > > > > > Regards, > > > > Gren > > > > > > > > > > Gren Elliot m: +44 7590 571125 www.mimecast.com > > <https://www.mimecast.com/?utm_source=EmailStationary&utm_medium=Email> > > Senior Software Engineer p: Address click here > > <https://www.mimecast.com/company/contact/?utm_source=EmailStationary&utm_medium=Email> > > > > [image: https://www.mimecast.com] > > <https://eu-api.mimecast.com/s/click/F2A44qlyvx7D1oreXULOBV5kMEPIKkV83Y4Ke-dt-NBDaF60XiI0--IA4dqHElBMaoswX807HbAxqGGR7xQ51HVPCRBYg4JXq_Wd9owjxjfwOBrI7hBD-W7h0EAlLCx_QYGLsysA_qxqzLlmgHh0s0QhvGUnBXihs0pinvg0j4BhulqLIIEXsdwdbimte5_S0h2AlbdQ0nEaRB4-UMa-vw> > > *Work > > Protected.™* > > > > > > *Disclaimer* > > The information contained in this communication from * > > gell...@mimecast.com <ge...@mimecast.com> * sent at 2024-03-19 11:45:54 > > is confidential and may be legally privileged. It is intended solely for > > use by * user@commons.apache.org <us...@commons.apache.org> * and others > > authorized to receive it. If you are not * user@commons.apache.org > > <us...@commons.apache.org> * you are hereby notified that any disclosure, > > copying, distribution or taking action in reliance of the contents of this > > information is strictly prohibited and may be unlawful. > > > > Visit our preference center to change how often you hear from us: Preference > > Center > > <https://info.mimecast.com/Subscription-Management.html?utm_source=EmailStationary>. > > > > > > > > > > >