Hi Jingguo,

Jeremy developed a new parser that parse only parts of the document, maybe his parser can handle your problem.
You can follow the conversation [1] at the developer mailing list [2].

Best regards
Thomas

[1] Subject: Pdfbox
http://pdfbox.markmail.org/search/?q=Jeremy+Villalobos#query:Jeremy%20Villalobos+page:1+mid:vnojjjkkzyjmmlbd+state:results

Mail 1: 10.07.2011 21:58:02 CEST
Mail 2: 03.08.2011 11:55:55 CEST
Mail 3: 15.08.2011 12:11:27 CEST

[2]
http://pdfbox.apache.org/mail-lists.html
[email protected]


Zitat von jingguo yao <[email protected]>:

The standard way to get PDDocumentInformation and PDDocumentCatalog is
through the following code:

    PDDocument doc = PDDocument.load(inputStream);
    PDDocumentInformation info = doc.getDocumentInformation();
    PDDocumentCatalog cat = doc.getDocumentCatalog();

What I want to do with PDF file is only to get properties such as the number
of pages and the modification date. I don't need anything else.  And I often
need to parse some big PDF files. It seems that the above code loads the
whole input stream for PDF file. Is there a faster way to get these
properties? Thanks.

--
Jingguo



Reply via email to