Re: Identify not visible characters - Overlapped characters

Tilman Hausherr Tue, 27 Dec 2016 16:02:07 -0800

Please upload the PDF somewhere.

Tilman


Am 28.12.2016 um 00:52 schrieb [email protected]:

Hello everyone,
I am using PDFBox 1.8.12 (because I’m developing in C#) and I canextract all characters from a PDF with the respective position.
My objective is to perform a layout analysis and try to reproduce thePDF layout in a text file.
However, I’m facing a huge problem: identify not visible characters.
In the annexed file, the text “Alandroal (Nossa Senhora da Conceic…”is using some space used by the word “Rural” (row 5), but not visible.
I would like to someone help me to get a way to identify the text notvisible, in order to avoid those characters in the text file.
This approach:http://stackoverflow.com/questions/19809813/how-to-check-if-a-text-is-transparent-with-pdfboxdoesn’t work in the annexed file (only works with images).
Many thanks in advance,

Paulo Sergio



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Identify not visible characters - Overlapped characters

Reply via email to