Chris - I agree. I now see the wisdom of making application/octet-stream the default mime type, possibly with the ability to override that.
In addition, though, I think we want to consider what Tika should do with such a byte stream. One option is to run it through strings to get ASCII text. Another is to have it fail the parse so that the user can be notified that Tika could not find a (definitely) suitable parser. Another might be to parse it as an empty string (if, for example, the text is known to be in Chinese, and the output of strings would be meaningless random garbage). In the future, maybe the user would consider it important enough to write a custom parser for application/octet-stream, and plug it into Tika. - Keith Chris Mattmann wrote: > > Hi Folks, > > Thinking this through more, it probably makes a lot of sense for the > Default MIME TYPE in Tika to be application/octet-stream. > -- View this message in context: http://www.nabble.com/Default-MIME-Type--tf4609978.html#a13185693 Sent from the Apache Tika - Development mailing list archive at Nabble.com.
