Hello. I am updating Rika (https://github.com/keithrbennett/rika, JRuby wrapper for Tika) to work with current Tika versions and to add a command line executable.
I noticed that Rika opens the document's input stream twice; once to call Tika#detect to get its media type, and again to do the parsing. Is this detect call unnecessary? I noticed a Content-Type in the parsed metadata, which has the same value as the value returned by Tika#detect. Is Content-Type at least as reliable as Tika#detect? Thanks for any help on this. Also, if you have any interest in rika, feel free to let me know. It would be great to talk to any current or prospective users of the gem. - Keith