Hi, [email protected] schrieb:
Take a look at the examples (src\main\java\org\apache\pdfbox\examples) and utils (src\main\java\org\apache\pdfbox\util) for examples with text extraction.As you have to define the start and the end page, if you use the PDFTextStripper class you should parse your pdfs page by page and you will always know the page number of every word you've extracted.
BR Andreas Lehmkühler

