No need to IMHO

Tilman



--- Original-Nachricht ---
Von: Andreas Lehmkühler
Betreff: Re: PDFBox 2.0.29 release?
Datum: 25. Juni 2023, 15:16
An: dev@pdfbox.apache.org




@Tilman <mailto:@Tilman> thanks for fixing this

Should we run another test before cutting the release?

Andreas

Am 03.06.23 um 05:53 schrieb Tilman Hausherr:
> Thank you. This is related to PDFBOX-5606. parseNextToken() is closing
> the content stream if an error occurs, but it sometimes calls itself.
> Because of the closed content stream the method returns null, which is
> reported with the position. Trying to get the position on a closed
> stream throws the exception.
>
> Tilman
>
> On 02.06.2023 17:08, Tim Allison wrote:
>> Reports are here:
>>
<https://corpora.tika.apache.org/base/reports/pdfbox-2.0.29-pre-rc1-reports.tgz>
>>
>>
>> One new exception which is reproducible with pure PDFBox app's
>> ExtractText.
>>
>><https://corpora.tika.apache.org/base/docs/govdocs1/819/819127.pdf>
>>
>> Exception in thread "main" org.apache.tika.exception.TikaException:
>> Unable
>> to extract PDF content
>> at
<http://org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:130> )
>> at<http://org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:212> 
)
>> at
>>
<http://org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298> 
)
>> at
>>
<http://org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298> 
)
>> at
>>
<http://org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:199>
 
)
>> at
>>
<http://org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:164>
 
)
>>
>> at
<http://org.apache.tika.cli.TikaCLI.handleRecursiveJson(TikaCLI.java:518> 
)
>> at<http://org.apache.tika.cli.TikaCLI.process(TikaCLI.java:489> )
>> at<http://org.apache.tika.cli.TikaCLI.main(TikaCLI.java:256> )
>> Caused by:<http://java.io.IOException> : Stream closed
>> at
>>
<http://java.base/java.io.PushbackInputStream.ensureOpen(PushbackInputStream.java:75>
 
)
>>
>> at
>>
<http://java.base/java.io.PushbackInputStream.read(PushbackInputStream.java:132>
 
)
>> at
>>
<http://org.apache.pdfbox.pdfparser.InputStreamSource.read(InputStreamSource.java:47>
 
)
>>
>> at
>>
<http://org.apache.pdfbox.pdfparser.BaseParser.skipSpaces(BaseParser.java:1257> 
)
>> at
>>
<http://org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:138>
 
)
>>
>> at
>>
<http://org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:548>
 
)
>>
>> at
>>
<http://org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:516>
 
)
>>
>> at
>>
<http://org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155>
 
)
>>
>> at
>>
<http://org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:155>
 
)
>>
>> at
>>
<http://org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:363>
 
)
>>
>> at
<http://org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:137> 
)
>> at
>>
<http://org.apache.tika.parser.pdf.AbstractPDF2XHTML.processPages(AbstractPDF2XHTML.java:1370>
 
)
>>
>> at
>> 
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:238)
>>
>> at
<http://org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:108> )
>>
>> On Wed, May 31, 2023 at 1:41 PM Tilman Hausherr <thaush...@t-online.de
<mailto:thaush...@t-online.de> >
>> wrote:
>>
>>> Yes please
>>>
>>> Thanks
>>>
>>> Tilman
>>>
>>> On 31.05.2023 17:15, Tim Allison wrote:
>>>> +1
>>>>
>>>> Let me know when/if I should run the text extraction regression tests.
>>>>
>>>> On Thu, May 25, 2023 at 12:32 PM sahy...@fileaffairs.de
<mailto:sahy...@fileaffairs.de> <
>>>> sahy...@fileaffairs.de <mailto:sahy...@fileaffairs.de> > wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Maruan
>>>>>
>>>>> Am Mittwoch, dem 24.05.2023 um 07:48 +0200 schrieb Andreas
>>>>> Lehmkuehler:
>>>>>> Hi,
>>>>>>
>>>>>> I tend to release 2.0.29 soon due to the regression which was solved
>>>>>> with
>>>>>> PDFBOX-5606.
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> Andreas
>>>>>>
>>>>>> 
---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
>>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>
>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>

Reply via email to