Re: Extract Embedded files from pdf using pdfbox in .NET application

Ramesh Shrestha Thu, 20 Jun 2013 02:03:36 -0700

Even after trying Annotation i am not able to extract the embedded/attached
doc file located in the page of pdf.


On Tue, Jun 11, 2013 at 5:29 PM, Andreas Lehmkuehler <[email protected]>wrote:

> Am 11.06.2013 07:06, schrieb Ramesh Shrestha:
>
>> Thanks,
>>
>> The java example link i provided should have been -
>> http://svn.apache.org/repos/**asf/pdfbox/trunk/examples/src/**
>> main/java/org/apache/pdfbox/**examples/pdmodel/**
>> ExtractEmbeddedFiles.java<http://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java>
>>
>> But your suggestion WORKS.
>>
>> Now i am able to extract the attached file located in the *attachments
>> tab*but
>> *haven't been able to extract the attached file located in page*. I am
>>
>> getting null efTree in this case.
>>
>>          PDDocumentNameDictionary namesDictionary = new
>> PDDocumentNameDictionary(**pdfDoc.getDocumentCatalog());
>>          PDEmbeddedFilesNameTreeNode *efTree *=
>>
>> namesDictionary.**getEmbeddedFiles();
>>
>> So now working on it.
>>
> Embedded files are always document related. If an embedded file is
> referenced
> on a single page a file attachment annotation is used. Try something like
> this
> to get all annotations of a single page:
>
> List annotations = page.getAnnotations();
>
> The one you are looking for has to be an instance of the class
>
> org.apache.pdfbox.pdmodel.**interactive.annotation.**
> PDAnnotationFileAttachment.
>
>  On Mon, Jun 10, 2013 at 7:38 PM, Andreas Lehmkuehler <[email protected]
>> >wrote:
>>
>>  Hi,
>>>
>>> Am 10.06.2013 11:22, schrieb Ramesh Shrestha:
>>>
>>>   Hi,
>>>
>>>>
>>>>
>>>>     I am developing .NET Application using pdfbox to extract metadata,
>>>> content and attached file from PDF.
>>>>
>>>> I was able to extract metadata and content, but stuck while extracting
>>>> attached/embedded files.
>>>>
>>>> I have a pdf with embedded/attached doc file and want to retrieve that
>>>> file. I have gone through the java example -
>>>> http://www.docjar.com/html/****api/org/apache/pdfbox/****
>>>> examples/pdmodel/**<http://www.docjar.com/html/**api/org/apache/pdfbox/**examples/pdmodel/**>
>>>> EmbeddedFiles.java.html<http:/**/www.docjar.com/html/api/org/**
>>>> apache/pdfbox/examples/**pdmodel/EmbeddedFiles.java.**html<http://www.docjar.com/html/api/org/apache/pdfbox/examples/pdmodel/EmbeddedFiles.java.html>
>>>> >
>>>>
>>>> .
>>>>
>>>> But while trying to use it in .Net, i got "non generic type
>>>> 'java.util.Map'
>>>> cannot be used with type arguments" in the following code snippet
>>>>
>>>> java.util.Map<String, COSObjectable> names = efTree.getNames();
>>>>
>>>> So, i will be grateful if anybody help me to extract the file from pdf.
>>>>
>>>>  I'm not a .NET expert and don't know what may cause that issue. But
>>> maybe
>>> it is
>>> a good idea to just omit the generics and try something like this:
>>>
>>> java.util.Map names = efTree.getNames();
>>>
>>>   Thanks in advance.
>>>
>>>>
>>>>
>>> HTH
>>> Andreas Lehmkühler
>>>
>>
> BR
> Andreas Lehmkühler
>
>

Re: Extract Embedded files from pdf using pdfbox in .NET application

Reply via email to