A proper sentence ends with a period, so text that is one character height below other text is assumed to be tacked onto the same sentence (with a space between). If you have the font, you know the font size, you should be able to calculate one character height. If sentences aren't ended with periods, text may be assumed to be a new sentence on a new line if it's more than a character height down.
ie A sentence here Another sentence here On Tue, Apr 21, 2015 at 4:21 PM, Hesham G. <[email protected]> wrote: > Frank , > > Thanks for explaining this. > > What I am trying to do is reading sentences from the PDF using > TextPosition. Your explanation is clear and I can detect the new line using > X & Y, but what if a sentence is written on 2 lines ? ... Reading the > Y-coordinate for the second line will result with dealing with it as a new > sentence instead of considering it a completion for the first line of the > sentence. > > > Best regards , > Hesham > > ------------------------------------------------------------------------ > Included message : > > Hi Hesham, > > There is no newline character in a PDF. Only printable characters are > saved, each with its X and Y coordinates. > If you sort the TextPositions by Y and X, you can detect 'newlines' by > finding an increase in Y and a decrease in X. However, this isn't > foolproof, since things like subscripts and superscripts are out of order > when sorted by Y. Where there are multiple columns, this won't work. > > Frank > > > On Wed, Apr 22, 2015 at 7:33 AM, Hesham G. <[email protected]> wrote: > > > Hello , > > > > When reading PDF text using TextPosition, is there a way to know if the > > current character is a new line character ? > > > > protected void processTextPosition( TextPosition text ) { > > System.out.println( text.getCharacter() ); // Prints space if this > is > > a new line character in the PDF file. > > } > > > > > > Best regards , > > Hesham >

