Hi,
Am 26.09.2013 13:36, schrieb Tapani Vaulasto:
Hi,
I use PDFBox 1.8.2 and this code to convert a PDF to txt-file:
PDDocument pd = PDDocument.load(input);
PDFTextStripper stripper = new PDFTextStripper();
BufferedWriter wr = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(output)));
stripper.writeText(pd, wr);
A PDF documes has tables.
Problem is that sometimes a table has one or more empty columns on a line.
Like here:
http://www.tulli.fi/fi/yksityisille/autoverotus/taulukot/autot/au/1308.pdf
On the page 2(44) some ALFA ROMEOs has an empty column.
Question: How to get all columns marked on a line for BufferedWriter?
Sorry, this can't be done with PDFBox. You have to analyze the text on your own.
Regards
Tapani Vaulasto
BR
Andreas Lehmkühler