[ 
https://issues.apache.org/jira/browse/PDFBOX-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500167#comment-14500167
 ] 

ASF subversion and git services commented on PDFBOX-2762:
---------------------------------------------------------

Commit 1674353 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1674353 ]

PDFBOX-2762: remove parseCOSStream() call from PDFStreamParser, because there 
are no streams in content streams

> remove parseCOSStream() call from PDFStreamParser
> -------------------------------------------------
>
>                 Key: PDFBOX-2762
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2762
>             Project: PDFBox
>          Issue Type: Task
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>             Fix For: 2.0.0
>
>
> This code is found in PDFStreamParser
> {code}
>                 if (c == '<')
>                 {
>                     COSDictionary pod = parseCOSDictionary();
>                     skipSpaces();
>                     if ((char)pdfSource.peek() == 's')
>                     {
>                         retval = parseCOSStream( pod );
>                     }
>                     else
>                     {
>                         retval = pod;
>                     }
>                 }
> {code}
> This is incorrect. PDFStreamParser is for content streams. There are no 
> streams in content streams, the spec requires "All streams shall be indirect 
> objects". An "indirect object" is something between obj and endobj. But 
> indirect objects are not allowed in content streams: "Indirect objects and 
> object references shall not be permitted at all". So parseCOSStream() will 
> never be called. Thus the new code will be
> {code}
>                 if (c == '<')
>                 {
>                     retval = parseCOSDictionary();
>                 }
> {code}
> To be sure, I tested my own test set and the digitalcopora set (250000 files) 
> to see whether parseCOSStream is ever called in PDFStreamParser. No it isn't. 
> How did this incorrect code end up there? Don't know, but it has been there 
> since 2002.
> http://pdfbox.cvs.sourceforge.net/viewvc/pdfbox/pdfbox/src/org/pdfbox/pdfparser/PDFStreamParser.java?revision=1.1&view=markup
> Why do I care about this? It is related to a posting in a mailing list by 
> Andrea Vacondio who mentioned that there are several versions of 
> parseCOSStream(), so I'm trying to clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to