The NLP sentence segmenter was really a helpful idea.
Thanks a lot John & Frank.


Best regards ,
Hesham

------------------------------------------------------------------------
Included message :

What have you got so far?  Can you provide sample code to work with?

On Wed, Apr 22, 2015 at 12:02 PM, Hesham G. <[email protected]> wrote:

Frank ,

I have handled TextPositions using X & Y coordinates as you have suggested
to detect new lines. It works fine, but if a sentence is written on 2 lines
I can't detect it. If you know a trick to detect that it will help a lot.

Best regards ,
Hesham

------------------------------------------------------------------------

Hi Hesham,

There is no newline character in a PDF. Only printable characters are
saved, each with its X and Y coordinates.
If you sort the TextPositions by Y and X, you can detect 'newlines' by
finding an increase in Y and a decrease in X. However, this isn't
foolproof, since things like subscripts and superscripts are out of order
when sorted by Y. Where there are multiple columns, this won't work.

Frank


On Wed, Apr 22, 2015 at 7:33 AM, Hesham G. <[email protected]> wrote:

 Hello ,

When reading PDF text using TextPosition, is there a way to know if the
current character is a new line character ?

protected void processTextPosition( TextPosition text )  {
System.out.println( text.getCharacter() ); // Prints space if this is
a new line character in the PDF file.
}


Best regards ,
Hesham



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to