I actually had an include metadata filter specified in tika-config, I
removed it and now I do get json output, however is it possible to also
force  "X-TIKA:content" to be in json format?
Now it is xhtml, e.g., "X-TIKA:content": "<html xmlns=\"
http://www.w3.org/1999/xhtml\";>\n<head>\n<meta
name=\"imagereader:NumImages\" content=\"1\" />\n<meta name=\"Transparency
Alpha\" ....
Reading the code, it looks like it is not possible, each parser extracts
the text using extractOutput(InputStream stream, XHTMLContentHandler xhtml)
... but maybe there is an easy way I can configure or implement conversion
to json?

Thanks,
Cristi


On Mon, Jun 16, 2025 at 2:10 PM Cristian Zamfir <[email protected]>
wrote:

> Hello,
>
> Is it possible to obtain json output from tika-server out of the box? I
> can get xml output, but prefer json. If I set "Accept: application/json", I
> get a reply with just the content type. for
> instance: {"Content-Type":"image/png"}.
>
> Thanks,
> Cristi
>
>

Reply via email to