Hi Tim,

Thank you for your response.
Yes, I am using /rmeta/form endpoint and I am getting info on embedded
files seperately but not getting information for which parent this embedded
file is belongs to so that I can track the chain of multilevel embedded
files.
So do have any meta property which tells us regarding this.

On Sat, Oct 22, 2022, 16:06 Tim Allison <talli...@apache.org> wrote:

> 1) If you're using the /tika endpoint, embedded files are marked up as
> such in the xhtml output with div tags.  If you want full info on embedded
> files, I'd strongly encourage using the /rmeta endpoint.
>
> 2) We don't offer content marked up with json, but we do offer a text
> option, which can be returned in the X-Tika-Content tag in the json output.
> See https://cwiki.apache.org/confluence/display/TIKA/TikaServer for
> details on how to request text.
>
> This might also be useful:
> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared
>
>
> On Fri, Oct 21, 2022 at 11:12 PM Chetan Bikire <chetab...@gmail.com>
> wrote:
>
>> 1) How does Tika server maintains Parent-Child relationship between main
>> document and it's embedded documents (i.e. Email with multiple attachment)
>> after parsing, so is their any property or tag using which we come to know
>> relationships?
>>
>> 2) After parsing any document we are getting all tags in JSON format
>> except *X-Tika-Content* tag which is in HTML format so is their any way
>> to get this in json format?
>>
>> Please Assist.
>> Thank You
>>
>

Reply via email to