Hi there, Are there any plans for improving detection for IWork files?
My team noticed that *detecting* Pages and Numbers files for IWork 13 doesn't work (returns vnd.apple.unknown.13), but if you *parse* the files, the resulting metadata has the correct MIME type. It looks like Tika just trusts the file extension (guessTypeByExtension) for IWork 13 Pages and Numbers files. I found a bunch of related tickets in the Tika backlog but they all seem dormant or outdated, so wanted to see if there was anything on the roadmap 🙂 https://issues.apache.org/jira/browse/TIKA-918 https://issues.apache.org/jira/browse/TIKA-1358 https://issues.apache.org/jira/browse/TIKA-2474 https://issues.apache.org/jira/browse/TIKA-2981 A couple of related follow-up questions: - Is it possible to detect IWork files *without* parsing them? I assume it's difficult (if not impossible) since it's a ZipStream - It looks like if *detect.apple.IWorkDetector:detect* (used in the *DefaultZipContainerDetector*) returns *UNKNOWN13,* the *parser.iwork.iwana.IWork13PackageParser:parse* method will use the file extension as the MIME type. This doesn't appear to be the case for the original IWork or IWork 18 formats (neither *IWorkPackageParser* or *IWork18PackageParser* have an *UNKNOWN* value for type), is it possible to align them more closely? - Is there a reason for *detect.apple.IWorkDetector:detect *itself not to use the file extension as the MIME type if detection fails, rather than reporting *UNKNOWN*? Apologies for a fairly lengthy question, thanks in advance for your time and response! Kind regards, Matt -- ** ** <https://www.canva.com/> Empowering the world to design We're hiring, apply here <https://www.canva.com/careers/>! Check out the latest news and learnings from our team on the Canva Newsroom <https://www.canva.com/newsroom/news/>. <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://twitter.com/canva> <https://facebook.com/canva> <https://www.linkedin.com/company/canva> <https://instagram.com/canva>
