FlateFilter.java swallows Exceptions (should rethrow)
-----------------------------------------------------

                 Key: PDFBOX-847
                 URL: https://issues.apache.org/jira/browse/PDFBOX-847
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.2.1
            Reporter: Andreas Wollschlaeger


I just re-discovered an issue in FlateFilter.java, which i mentioned quite a 
while ago on the mailinglist; and which was agreed to be an misfeature :-)

In FlateFilter.java, at lines 115ff, we find this piece of code:

                    try 
                    {
                        // decoding not needed
                        while ((amountRead = decompressor.read(buffer, 0, 
Math.min(mayRead,BUFFER_SIZE))) != -1)
                        {
                            result.write(buffer, 0, amountRead);
                        }
                    }
                    catch (OutOfMemoryError exception) 
                    {
                        // if the stream is corrupt an OutOfMemoryError may 
occur
                        log.error("Stop reading corrupt stream");
                    }
                    catch (ZipException exception) 
                    {
                        // if the stream is corrupt an OutOfMemoryError may 
occur
                        log.error("Stop reading corrupt stream");
                    }
                    catch (EOFException exception) 
                    {
                        // if the stream is corrupt an OutOfMemoryError may 
occur
                        log.error("Stop reading corrupt stream");
                    }

which means these Exceptions are discarded and not reported upstream to the 
caller. This is very infortunate, as the caller has no means to discover that 
text extraction is incomplete. I discovered this on troubleshooting Alfresco 
DMS, which uses PDFBox for indexing PDF documents - except an innocent log 
message, Alfresco does not know that conversion has failed.

Proposed solution is to re-throw all 3 Exceptions and let the caller handle the 
exceptions 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to