Hi guys,

I'm using PDFBox 1.8.10 to extract some text from a PDF (see attachment).

The output lines are not correctly sorted.

Got:

1/435 S LOPES CÂNDIDO FELIX LOPESABEL DIA 27-09-1964
FRANCISCA MARIA DIAS

Was expecting:

1/435 ABEL DIAS LOPES CÂNDIDO FELIX LOPES 27-09-1964
FRANCISCA MARIA DIAS

My simple code:

         PDDocument pdf = PDDocument.load(new File(FILE_PATH));

        PDFTextStripper stripper = new PDFTextStripper();

        stripper.setStartPage(1);
        stripper.setEndPage(1);
        stripper.setSortByPosition(true);

        String plainText = stripper.getText(pdf);

        System.out.println(plainText);


Thanks in advance.
                                          
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to