I actually had an include metadata filter specified in tika-config, I removed it and now I do get json output, however is it possible to also force "X-TIKA:content" to be in json format? Now it is xhtml, e.g., "X-TIKA:content": "<html xmlns=\" http://www.w3.org/1999/xhtml\">\n<head>\n<meta name=\"imagereader:NumImages\" content=\"1\" />\n<meta name=\"Transparency Alpha\" .... Reading the code, it looks like it is not possible, each parser extracts the text using extractOutput(InputStream stream, XHTMLContentHandler xhtml) ... but maybe there is an easy way I can configure or implement conversion to json?
Thanks, Cristi On Mon, Jun 16, 2025 at 2:10 PM Cristian Zamfir <[email protected]> wrote: > Hello, > > Is it possible to obtain json output from tika-server out of the box? I > can get xml output, but prefer json. If I set "Accept: application/json", I > get a reply with just the content type. for > instance: {"Content-Type":"image/png"}. > > Thanks, > Cristi > >
