Hi All,
I am getting the following exception when trying to convert many PDF to Text
files (in a loop):
ABC.pdf
PDF to Text conversion of ABC.txt has succeeded
An exception occured in parsing the PDF Document.
java.io.IOException: Error: Header doesn't contain versioninfo
XYZ.pdf
at
org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:312)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169)
at PDF2Text.PDFTextParser.pdftoText(PDFTextParser.java:39)
at hpg.ImportHPGData.main(ImportHPGData.java:43)
Below is the PDFBox example where pdftoText() & writeTexttoFile() methods have
been merged:
public boolean pdftoText(String pdfSource, String txtTarget) {
try {
parser = new PDFParser(new FileInputStream(new File(pdfSource)));
} catch (Exception e) {
System.out.println("Unable to open " + pdfSource);
}
try
{
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
if (parsedText == null) {
System.out.println("File " + pdfSource + " has failed PDF to Text
Conversion.");
return false;
}
else
{
BufferedWriter txtTargetBW = new BufferedWriter(new
FileWriter(txtTarget));
txtTargetBW.write(parsedText);
txtTargetBW.close();
try {
if (cosDoc != null) cosDoc.close();
if (pdDoc != null) pdDoc.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
catch (Exception e)
{
System.out.println("An exception occured in parsing the PDF
Document.");
e.printStackTrace();
}
return true;
}
Any reason why this is occurring? I had no problem converting individual file
and using the original example where the 2 methods were separated?
Thanks in advance,
Jack