Hello. I am updating Rika (https://github.com/keithrbennett/rika, JRuby
wrapper for Tika) to work with current Tika versions and to add a command
line executable.

I noticed that Rika opens the document's input stream twice; once to call
Tika#detect to get its media type, and again to do the parsing. Is this
detect call unnecessary? I noticed a Content-Type in the parsed metadata,
which has the same value as the value returned by Tika#detect. Is
Content-Type at least as reliable as Tika#detect?

Thanks for any help on this. Also, if you have any interest in rika, feel
free to let me know. It would be great to talk to any current or
prospective users of the gem.

- Keith

Reply via email to