On Sun, 15 Jan 2012, Public Network Services wrote:
I am using Tika 0.9 to detect various types of files and formats, but not getting the expected behavior.
I'd suggest you try a recent nighlty build, and see if that helps - we've done quite a bit of detection work since 0.9
- For various application files (e.g., images or MS-Office files) the detected type is the generic "application/octet-stream", as opposed to the specific MIME type for the application.
For office file formats to be properly detected, you'll need to also have the tika parsers jar (+ dependencies) in your classpath, so that the extra detectors are present
The detection is made via a simple call to new Tika().detect(inputStream);
It's worth double checking with the tika-app jar and the --detect flag, that'll let you verify if a detection problem is really a Tika one, or a problem with your setup (eg missing jars)
Nick
