Hm, I see where it is throwing the exception. Would you create a Jira
ticket for this feature request and attach at least one example gz file and
a failing JUnit test?

TY,
Gary

On Tue, Aug 15, 2023, 12:31 PM Tim Allison <talli...@apache.org> wrote:

> Gary,
>
> I'm sorry for my delay.  I'm just back to the keyboard from some time away.
>
> This is an example from the gz stream.  We had similar messages from some
> bzip2 and xz.
>
> Caused by: java.io.IOException: Garbage after a valid .gz stream
>         at
> org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.init(GzipCompressorInputStream.java:240)
>         at
> org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:391)
>         at
> org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:205)
>         at java.base/java.io
> .BufferedInputStream.fill(BufferedInputStream.java:252)
>         at java.base/java.io
> .BufferedInputStream.read1(BufferedInputStream.java:292)
>         at java.base/java.io
> .BufferedInputStream.read(BufferedInputStream.java:351)
>         at
> org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:205)
>
> Thank you!
>
> On 2023/07/29 14:49:23 Gary Gregory wrote:
> > Hi Tim,
> >
> > Do you have a stack trace? Maybe this is an option we can add...
> >
> > Gary
> >
> > On Wed, Jul 26, 2023, 3:22 PM Tim Allison <talli...@apache.org> wrote:
> >
> > > We recently had a request to change our default behavior to turn on
> > > processing multiple/concatenated compressor streams for gzip, bzip2,
> etc.
> > > When we made this change and compared the updated results with our
> previous
> > > results, we lost quite a few attachments because of the "garbage after
> a
> > > valid x" exception and because of how we're buffering/digesting the
> stream.
> > >
> > > Is there any way to turn on extraction of concatenated compressor
> streams,
> > > but have it silently stop reading instead of throwing a garbage
> exception?
> > >
> > > Thank you!
> > >
> > > Best,
> > >
> > >         Tim
> > >
> > >
> > > [0] https://issues.apache.org/jira/browse/TIKA-4048
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> For additional commands, e-mail: user-h...@commons.apache.org
>
>

Reply via email to