Tim, thank you so much for responding. Can I rely on Content-Type to always be populated by a parse?
- Keith On Mon, Aug 14, 2023 at 10:09 PM Tim Allison <talli...@apache.org> wrote: > Content-Type may be more reliable/specific because for some file types, > the parser updates the file type during the parse. For example the PDF > parser updates application/pdf -> application/illustrator (or similar?) if > the parser determines that the file is a PDF-based Adobe Illustrator file. > The detector doesn't do a full parse so it will only return > "application/pdf". > > On Wed, Aug 9, 2023 at 9:54 AM Keith Bennett <keithrbenn...@gmail.com> > wrote: > >> Hello. I am updating Rika (https://github.com/keithrbennett/rika, JRuby >> wrapper for Tika) to work with current Tika versions and to add a command >> line executable. >> >> I noticed that Rika opens the document's input stream twice; once to call >> Tika#detect to get its media type, and again to do the parsing. Is this >> detect call unnecessary? I noticed a Content-Type in the parsed metadata, >> which has the same value as the value returned by Tika#detect. Is >> Content-Type at least as reliable as Tika#detect? >> >> Thanks for any help on this. Also, if you have any interest in rika, feel >> free to let me know. It would be great to talk to any current or >> prospective users of the gem. >> >> - Keith >> >> >> >>