Frank ,

Thanks for explaining this. 

What I am trying to do is reading sentences from the PDF using TextPosition. 
Your explanation is clear and I can detect the new line using X & Y, but what 
if a sentence is written on 2 lines ? ... Reading the Y-coordinate for the 
second line will result with dealing with it as a new sentence instead of 
considering it a completion for the first line of the sentence.


Best regards ,
Hesham

------------------------------------------------------------------------
Included message :

Hi Hesham,

There is no newline character in a PDF. Only printable characters are
saved, each with its X and Y coordinates.
If you sort the TextPositions by Y and X, you can detect 'newlines' by
finding an increase in Y and a decrease in X. However, this isn't
foolproof, since things like subscripts and superscripts are out of order
when sorted by Y. Where there are multiple columns, this won't work.

Frank


On Wed, Apr 22, 2015 at 7:33 AM, Hesham G. <[email protected]> wrote:

> Hello ,
>
> When reading PDF text using TextPosition, is there a way to know if the
> current character is a new line character ?
>
> protected void processTextPosition( TextPosition text )  {
>     System.out.println( text.getCharacter() );  // Prints space if this is
> a new line character in the PDF file.
> }
>
>
> Best regards ,
> Hesham

Reply via email to