Hi there,

Are there any plans for improving detection for IWork files?

My team noticed that *detecting* Pages and Numbers files for IWork 13
doesn't work (returns vnd.apple.unknown.13), but if you *parse* the files,
the resulting metadata has the correct MIME type. It looks like Tika just
trusts the file extension (guessTypeByExtension) for IWork 13 Pages and
Numbers files.

I found a bunch of related tickets in the Tika backlog but they all seem
dormant or outdated, so wanted to see if there was anything on the roadmap
🙂
https://issues.apache.org/jira/browse/TIKA-918
https://issues.apache.org/jira/browse/TIKA-1358
https://issues.apache.org/jira/browse/TIKA-2474
https://issues.apache.org/jira/browse/TIKA-2981

A couple of related follow-up questions:

   - Is it possible to detect IWork files *without* parsing them? I assume
   it's difficult (if not impossible) since it's a ZipStream
   - It looks like if *detect.apple.IWorkDetector:detect* (used in the
   *DefaultZipContainerDetector*) returns *UNKNOWN13,* the
   *parser.iwork.iwana.IWork13PackageParser:parse* method will use the file
   extension as the MIME type. This doesn't appear to be the case for the
   original IWork or IWork 18 formats (neither *IWorkPackageParser* or
   *IWork18PackageParser* have an *UNKNOWN* value for type), is it possible
   to align them more closely?
   - Is there a reason for *detect.apple.IWorkDetector:detect *itself not
   to use the file extension as the MIME type if detection fails, rather than
   reporting *UNKNOWN*?

Apologies for a fairly lengthy question, thanks in advance for your time
and response!

Kind regards,

Matt

-- 
**
** <https://www.canva.com/>
Empowering the world to design
We're hiring, 
apply here <https://www.canva.com/careers/>! Check out the latest news and 
learnings from our team on the Canva Newsroom 
<https://www.canva.com/newsroom/news/>.
 <https://twitter.com/canva>
 
<https://facebook.com/canva>
 <https://au.linkedin.com/company/canva>
 
<https://twitter.com/canva>  <https://facebook.com/canva>  
<https://www.linkedin.com/company/canva> 
 <https://instagram.com/canva>



Reply via email to