Hi,
for people having the same problem like me, here an answer I received from
Pablo in PT group:
About your problem I beleive this is a constraint of the Apache Tika [1],
which is used by the mapper-attachment plugin.
I believe that a search over Tika pdf limitations or a question on their
Hi everybody,
I want to perform URL extraction from my PDF files. I use mapper-attachment
plugin to index my PDF files.
In order to be able to perform some regex queries and extract all the urls
present in a pdf file, I useduax_url_email:
curl -X PUT localhost:9200/test -d '{
settings