[ https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731291#action_12731291 ]
Noble Paul commented on SOLR-1274: ---------------------------------- We are in the process of a release . New feature requests are not generally entertained. Shall we move it to 1.5 ? > Provide multiple output formats in extract-only mode for tika handler > --------------------------------------------------------------------- > > Key: SOLR-1274 > URL: https://issues.apache.org/jira/browse/SOLR-1274 > Project: Solr > Issue Type: New Feature > Affects Versions: 1.4 > Reporter: Peter Wolanin > Priority: Minor > Fix For: 1.4 > > > The proposed feature is to accept a URL parameter when using extract-only > mode to specify an output format. This parameter might just overload the > existing "ext.extract.only" so that one can optionally specify a format, e.g. > false|true|xml|text where true and xml give the same response (i.e. xml > remains the default) > I had been assuming that I could choose among possible tika output > formats when using the extracting request handler in extract-only mode > as if from the CLI with the tika jar: > -x or --xml Output XHTML content (default) > -h or --html Output HTML content > -t or --text Output plain text content > -m or --metadata Output only metadata > However, looking at the docs and source, it seems that only the xml > option is available (hard-coded) in ExtractingDocumentLoader.java > {code} > serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", > true)); > {code} > Providing at least a plain-text response seems to work if you change the > serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.