göktürk mavuş wrote
> for some pdf documents, it throws an exception which is shown at below of
> the code. I donot understand why this exception is sent for some document
> but for some other it is not thrown. Moreover and urgently, how can I
> solve this problem?
> 
> 
>       ...
>                               dictionary = reader.getPageN(currentPage);
>                               reference = (PRIndirectReference) 
> dictionary.get(PdfName.CONTENTS);
>     /*line 166*/            contentStream = (PRStream)
> PdfReader.getPdfObject(reference);
>                               
>       ...
> 
> **Exception :**
> 
>       java.lang.ClassCastException: com.itextpdf.text.pdf.PdfArray cannot be
> cast to com.itextpdf.text.pdf.PRStream
>       at pdfCrawler.retrieveContentOfPdf(CrawlerTask.java:166)
>       ...

What makes you think the CONTENTS always resolve to a single PDF stream
object?

The specification clearly states that it can also be an array of streams:

> Contents
> stream or array
> (Optional) A content stream (see 7.8.2, "Content Streams") that shall
> describe the contents of this page. If this entry is absent, the page
> shall be empty. 
> The value shall be either a single stream or an array of streams. If the
> value is an array, the effect shall be as if all of the streams in the
> array were concatenated, in order, to form a single stream. Conforming
> writers can create image objects and other resources as they occur, even
> though they interrupt the content stream. The division between streams may
> occur only at the boundaries between lexical tokens (see 7.2, "Lexical
> Conventions") but shall be unrelated to the page’s logical content or
> organization. Applications that consume or produce PDF files need not
> preserve the existing structure of the Contents array. Conforming writers
> shall not create a Contents array containing no elements.

(Table 30 in
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf)

Regards,   Michael





--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Extracting-content-from-Pdf-document-throws-exception-tp4660121p4660124.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to