Pull request created - see https://github.com/apache/commons-compress/pull/500

Simple change to check the checksum on entries whilst looping over ones 
considered to be directories.
Added 2 more tests that real tar archives ARE identified and that one utf-16 
text file is NOT.

Gren Elliot
Senior Software Engineer
m: +44 7590 571125
p: 
w: https://www.mimecast.com
Address: https://www.mimecast.com/company/contact/

Work Protected.™

On 2024/03/19 12:18:11 Gary Gregory wrote:
> Hello Gren,
> 
> Feel free to provide a PR on GitHub (with a unit test) so we can see
> clearly what you suggest.
> 
> TY!
> Gary
> 
> On Tue, Mar 19, 2024, 7:46 AM Gren Elliot <ge...@mimecast.com.invalid>
> wrote:
> 
> > Hi,
> >
> >
> >
> > I’m finding that commons-compress-1.26.1 is recognising a utf-16 text file
> > as a tar archive – unlike the previous version
> >
> >
> >
> > This is the code that changed in that release in ArchiveStreamFactory - 
> > *public
> > static String detect(final InputStream in) throws ArchiveException {*
> >
> > that differs in detection:
> >
> >
> >
> > if (signatureLength >= *TAR_HEADER_SIZE*) {
> >     try (TarArchiveInputStream inputStream = new TarArchiveInputStream(new
> > ByteArrayInputStream(tarHeader))) {
> >
> >
> > *// COMPRESS-191 - verify the header checksum         // COMPRESS-644 - do
> > not allow zero byte file entries         *TarArchiveEntry entry =
> > inputStream.getNextEntry();
> >
> > *// try to find the first non-directory entry within the first 10 entries.
> >         *int count = 0;
> >         while (entry != null && entry.isDirectory() && count++ <
> > *TAR_TEST_ENTRY_COUNT*) {
> >             entry = inputStream.getNextEntry();
> >         }
> >         if (entry != null && entry.isCheckSumOK() && !entry.isDirectory()
> > && entry.getSize() > 0 || count > 0) {
> >             return *TAR*;
> >         }
> >     } catch (final Exception e) {
> >
> > *// NOPMD NOSONAR         // can generate IllegalArgumentException as well
> > as IOException auto-detection, simply not a TAR ignored     *}
> > }
> >
> >
> >
> > I feel this is being too lenient.  For instance at the last “if”
> > statement, for the test file, entry is null and count=1.  The code suggests
> > it is looking for the first non-directory entry.  It hasn’t found a
> > non-directory entry in our case.
> >
> >
> >
> > For instance, the earlier code at least checked that the checksum was OK
> > for the one entry it checked (it isn’t for our test file…)
> >
> >
> >
> > Regards,
> >
> > Gren
> >
> >
> >
> >
> > Gren Elliot m: +44 7590 571125 www.mimecast.com
> > <https://www.mimecast.com/?utm_source=EmailStationary&utm_medium=Email>
> > Senior Software Engineer p: Address click here
> > <https://www.mimecast.com/company/contact/?utm_source=EmailStationary&utm_medium=Email>
> >
> > [image: https://www.mimecast.com]
> > <https://eu-api.mimecast.com/s/click/F2A44qlyvx7D1oreXULOBV5kMEPIKkV83Y4Ke-dt-NBDaF60XiI0--IA4dqHElBMaoswX807HbAxqGGR7xQ51HVPCRBYg4JXq_Wd9owjxjfwOBrI7hBD-W7h0EAlLCx_QYGLsysA_qxqzLlmgHh0s0QhvGUnBXihs0pinvg0j4BhulqLIIEXsdwdbimte5_S0h2AlbdQ0nEaRB4-UMa-vw>
> >  *Work
> > Protected.™*
> >
> >
> > *Disclaimer*
> > The information contained in this communication from *
> > gell...@mimecast.com <ge...@mimecast.com> * sent at 2024-03-19 11:45:54
> > is confidential and may be legally privileged. It is intended solely for
> > use by * user@commons.apache.org <us...@commons.apache.org> * and others
> > authorized to receive it. If you are not * user@commons.apache.org
> > <us...@commons.apache.org> * you are hereby notified that any disclosure,
> > copying, distribution or taking action in reliance of the contents of this
> > information is strictly prohibited and may be unlawful.
> >
> > Visit our preference center to change how often you hear from us: Preference
> > Center
> > <https://info.mimecast.com/Subscription-Management.html?utm_source=EmailStationary>.
> >
> >
> >
> >
> >
> 

Reply via email to