Re: Select tika output for extract-only?

Peter Wolanin Mon, 13 Jul 2009 12:12:43 -0700

Ok, thanks. I played with it enough to to get plain text out at least,
but I'll wait for the resolution of SOLR-284


-Peter

On Sun, Jul 12, 2009 at 9:20 AM, Yonik Seeley<yo...@lucidimagination.com> wrote:
> Peter, I'm hacking up solr cell right now, trying to simplify the
> parameters and fix some bugs (see SOLR-284)
> A quick patch to specify the output format should make it into 1.4 -
> but you may want to wait until I finish.
>
> -Yonik
> http://www.lucidimagination.com
>
> On Sat, Jul 11, 2009 at 5:39 PM, Peter Wolanin<peter.wola...@acquia.com> 
> wrote:
>> I had been assuming that I could choose among possible tika output
>> formats when using the extracting request handler in extract-only mode
>> as if from the CLI with the tika jar:
>>
>>    -x or --xml        Output XHTML content (default)
>>    -h or --html       Output HTML content
>>    -t or --text       Output plain text content
>>    -m or --metadata   Output only metadata
>>
>> However, looking at the docs and source, it seems that only the xml
>> option is available (hard-coded) in ExtractingDocumentLoader:
>>
>> serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", 
>> true));
>>
>> In addition, it seems that the metadata is always appended to the response.
>>
>> Are there any open issues relating to this, or opinions on whether
>> adding additional flexibility to the response format would be of
>> interest for 1.4?
>>
>> Thanks,
>>
>> Peter
>>
>> --
>> Peter M. Wolanin, Ph.D.
>> Momentum Specialist,  Acquia. Inc.
>> peter.wola...@acquia.com
>>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: Select tika output for extract-only?

Reply via email to