An exception occured in parsing the PDF Document.

Jack Bush Thu, 11 Aug 2011 01:31:04 -0700
Hi All,
 
I am getting the following exception when trying to convert many PDF to Text 
files (in a loop):
 
ABC.pdf
PDF to Text conversion of ABC.txt has succeeded
An exception occured in parsing the PDF Document.
java.io.IOException: Error: Header doesn't contain versioninfo
XYZ.pdf
            at 
org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:312)
            at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169)
            at PDF2Text.PDFTextParser.pdftoText(PDFTextParser.java:39)
            at hpg.ImportHPGData.main(ImportHPGData.java:43)
 
Below is the PDFBox example where pdftoText() & writeTexttoFile() methods have 
been merged:
 
    public boolean pdftoText(String pdfSource, String txtTarget) {
 
        try {
            parser = new PDFParser(new FileInputStream(new File(pdfSource)));
        } catch (Exception e) {
            System.out.println("Unable to open " + pdfSource);
        }
        
        try 
        {
            parser.parse();
            cosDoc = parser.getDocument();
            pdfStripper = new PDFTextStripper();
            pdDoc = new PDDocument(cosDoc);
            parsedText = pdfStripper.getText(pdDoc);
            if (parsedText == null) {
            System.out.println("File " + pdfSource + " has failed PDF to Text 
Conversion.");
                return false;
            }
            else 
            {
                BufferedWriter txtTargetBW = new BufferedWriter(new 
FileWriter(txtTarget));
                txtTargetBW.write(parsedText);
                txtTargetBW.close();
                try {
                       if (cosDoc != null) cosDoc.close();
                       if (pdDoc != null) pdDoc.close();
                   } catch (Exception e) {
                   e.printStackTrace();
                }
            }
        } 
        catch (Exception e) 
        {
            System.out.println("An exception occured in parsing the PDF 
Document.");
            e.printStackTrace();
        }
        return true;
}
 
Any reason why this is occurring? I had no problem converting individual file 
and using the original example where the 2 methods were separated?
 
Thanks in advance,
 
Jack
An exception occured in parsing the PDF Document.

Reply via email to