Not sure how you're using Tika, but if you use the /rmeta endpoint in tika
server, or the -J option in tika-app or the RecursiveParserWrapper in code,
you should be able to get what you need.

On Tue, Jun 13, 2023 at 5:33 AM Willy T. Koch <t...@kochkonsult.no> wrote:

> Hi,
> Does Tika support detecting if a PDF has embedded files, and even better
> return an array of the file names?
>
> I was forwarded a "signed" PDF from a vendor that appearantly makes their
> own signing solution. The PDF doesn't contain any standard PaDES properties
> that triggers the signature panel in Acrobat or hasSignature:true or any of
> the other signature properties in Tika.
>
> It consisted of embedding six html files with various technical info
> inside the PDF, like here, from the raw content:
>
> obj
> <</Names[(Appendix 1 Evidence Quality Framework.html) 99 0 R (Appendix 2
> Service Description.html) 101 0 R (Appendix 3 Evidence Log.html) 105 0 R
> (Appendix 4 Evidence of Time.html) 107 0 R (Appendix 5 Evidence of
> Intent.html) 109 0 R (Appendix 6 Digital Signature Documentation.html) 103
> 0 R (Evidence Quality of xxxxx E-signed Documents.html) 97 0 R]>>
> endobj
> 112 0 obj
>
> From a security perspective this would also be very useful when using Tika
> as a secure file gateway for file analysis and detecting malicious files.
>
> Thanks,
> Willy T. Koch
> Norway
>
>

Reply via email to