Yes, it is. This is almost what I am working on at the moment.
To prevent you from wasting much time on research, have a look at the
PDFStreamEngine (more precisely override the processTextPosition
function). If you manage to extend PDFTextStripper, it may be better
since it manages text flows even if it is columned layered. I didn't
manage to do this and PDFStreamEngine suites my needs at the moment.
In the PDF, text is cut in groups of words... and sometimes even words
are cut in half. You'll have to process the text flow with a back
match memory when parsing the flow.
You'll need to deal with the graphic state (to get the text
coordinates) and will have to hack it a bit to get the approximate
position of words or sentences you are looking for (because of the
text flow structure).
Julien PLÉE
Le 17 sept. 10 à 20:24, José Rodolfo Carrijo de Freitas a écrit :
Hello,
Do you believe it is possible to read a text from a pdf and wrap a
text with
a link?
For example:
if it founds “pdfbox” on the box, it will link it to the pdfbox
website.
Thanks,
José Rodolfo Carrijo de Freitas
Analista de Sistemas
Softplan - Departamento de pesquisa e desenvolvimento
Sistema da Qualidade Certificado ISO 9001:2008
(48) 3027 8000 Ramal 8359
http://www.softplan.com.br