Tim Allison created TIKA-1958: --------------------------------- Summary: Add mime detection and lightweight parsers for Office 2003 Word and Excel formats Key: TIKA-1958 URL: https://issues.apache.org/jira/browse/TIKA-1958 Project: Tika Issue Type: Improvement Reporter: Tim Allison Priority: Minor
Over on POI, a user asked if we supported 2003 xls (xml) files. It would be neat if we could add mime detection and a "good enough" parser to handle 2003 xls and doc files. This could be a great task for someone wanting to get started in contributing to Tika. references: https://mail-archives.apache.org/mod_mbox/poi-user/201604.mbox/%3Calpine.BSO.2.20.1604210825140.22929%40ref.nmedia.net%3E https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats https://msdn.microsoft.com/en-us/library/bb226687(v=office.11).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)