Tim, thank you so much for responding. Can I rely on Content-Type to always
be populated by a parse?

- Keith


On Mon, Aug 14, 2023 at 10:09 PM Tim Allison <talli...@apache.org> wrote:

> Content-Type may be more reliable/specific because for some file types,
> the parser updates the file type during the parse.  For example the PDF
> parser updates application/pdf -> application/illustrator (or similar?) if
> the parser determines that the file is a PDF-based Adobe Illustrator file.
> The detector doesn't do a full parse so it will only return
> "application/pdf".
>
> On Wed, Aug 9, 2023 at 9:54 AM Keith Bennett <keithrbenn...@gmail.com>
> wrote:
>
>> Hello. I am updating Rika (https://github.com/keithrbennett/rika, JRuby
>> wrapper for Tika) to work with current Tika versions and to add a command
>> line executable.
>>
>> I noticed that Rika opens the document's input stream twice; once to call
>> Tika#detect to get its media type, and again to do the parsing. Is this
>> detect call unnecessary? I noticed a Content-Type in the parsed metadata,
>> which has the same value as the value returned by Tika#detect. Is
>> Content-Type at least as reliable as Tika#detect?
>>
>> Thanks for any help on this. Also, if you have any interest in rika, feel
>> free to let me know. It would be great to talk to any current or
>> prospective users of the gem.
>>
>> - Keith
>>
>>
>>
>>

Reply via email to