Re: Reading text using TextPosition

2015-04-26 Thread Hesham G.
The NLP sentence segmenter was really a helpful idea. Thanks a lot John & Frank. Best regards , Hesham Included message : What have you got so far? Can you provide sample code to work with? On Wed, Apr 22, 2015 at 12:02

Re: Reading text using TextPosition

2015-04-22 Thread John Hewson
> On 21 Apr 2015, at 13:21, Hesham G. wrote: > > Frank , > > Thanks for explaining this. > > What I am trying to do is reading sentences from the PDF using TextPosition. > Your explanation is clear and I can detect the new line using X & Y, but what > if a sentence is written on 2 lines ?

Re: Reading text using TextPosition

2015-04-22 Thread Eric Douglas
What have you got so far? Can you provide sample code to work with? On Wed, Apr 22, 2015 at 12:02 PM, Hesham G. wrote: > Frank , > > I have handled TextPositions using X & Y coordinates as you have suggested > to detect new lines. It works fine, but if a sentence is written on 2 lines > I can't

Re: Reading text using TextPosition

2015-04-22 Thread Hesham G.
Frank , I have handled TextPositions using X & Y coordinates as you have suggested to detect new lines. It works fine, but if a sentence is written on 2 lines I can't detect it. If you know a trick to detect that it will help a lot. Best regards , Hesham -

Re: Reading text using TextPosition

2015-04-21 Thread Tilman Hausherr
Am 21.04.2015 um 23:00 schrieb Hesham Gneady: A sentence could also end with a question mark, exclamation mark, ... Etc. I think there will be many cases to handle. I also wonder .. When reading text from the book using PDFTextStripper it can read the new line characters, right ? TextPosition se

Re: Reading text using TextPosition

2015-04-21 Thread Hesham Gneady
A sentence could also end with a question mark, exclamation mark, ... Etc. I think there will be many cases to handle. I also wonder .. When reading text from the book using PDFTextStripper it can read the new line characters, right ? TextPosition seems to be reading the pdf text in a different wa

Re: Reading text using TextPosition

2015-04-21 Thread Eric Douglas
A proper sentence ends with a period, so text that is one character height below other text is assumed to be tacked onto the same sentence (with a space between). If you have the font, you know the font size, you should be able to calculate one character height. If sentences aren't ended with perio

Re: Reading text using TextPosition

2015-04-21 Thread Hesham G.
Frank , Thanks for explaining this. What I am trying to do is reading sentences from the PDF using TextPosition. Your explanation is clear and I can detect the new line using X & Y, but what if a sentence is written on 2 lines ? ... Reading the Y-coordinate for the second line will result wit

Re: Reading text using TextPosition

2015-04-21 Thread Frank van der Hulst
Hi Hesham, There is no newline character in a PDF. Only printable characters are saved, each with its X and Y coordinates. If you sort the TextPositions by Y and X, you can detect 'newlines' by finding an increase in Y and a decrease in X. However, this isn't foolproof, since things like subscript