Hi guys,
I'm using PDFBox 1.8.10 to extract some text from a PDF (see attachment).
The output lines are not correctly sorted.
Got:
1/435 S LOPES CÂNDIDO FELIX LOPESABEL DIA 27-09-1964
FRANCISCA MARIA DIAS
Was expecting:
1/435 ABEL DIAS LOPES CÂNDIDO FELIX LOPES 27-09-1964
FRANCISCA MARIA DIAS
My simple code:
PDDocument pdf = PDDocument.load(new File(FILE_PATH));
PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(1);
stripper.setEndPage(1);
stripper.setSortByPosition(true);
String plainText = stripper.getText(pdf);
System.out.println(plainText);
Thanks in advance.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]