[ https://issues.apache.org/jira/browse/TIKA-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoni Mylka closed TIKA-812. ----------------------------- Resolution: Fixed Fix Version/s: 1.1 Committed tika-812-ver2.patch in r1220687. > Improve the detection of Works Spreadsheet 7.0 files > ---------------------------------------------------- > > Key: TIKA-812 > URL: https://issues.apache.org/jira/browse/TIKA-812 > Project: Tika > Issue Type: Improvement > Components: mime > Affects Versions: 1.1 > Reporter: Antoni Mylka > Fix For: 1.1 > > Attachments: testWORKSSpreadsheet7.0.xlr, tika-812-ver2.patch, > tika-812.patch > > > This was originally part of ver3 of my patch submitted to TIKA-806. > Works Spreadsheet files are weird. Versions up to 3.0 used a Quattro Pro > magic, version 4.0 used its own magic, while version 7.0 (probably later ones > as well) use an OLE2 structure and an MS Office magic. The 7.0 files also > contain an entry labelled "Workbook". In Tika this makes both MimeTypes (due > to the quirk recently discussed in TIKA-806) and the POIFSContainerDetector > label them as Excel. > "Conceptually" they should be vnd.ms-works, but "technically" they are > vnd.ms-excel. A special media type seems like a good compromise, similar in > vein to the compromise we reached with TIKA-798. > I would like to mark them with a new media type: > "application/x-tika-msworks-spreadsheet". It would be a subclass of > vnd.ms-excel so that: > # With pure MimeTypes and no name, ms-excel could be returned. > # With MimeTypes with name and data, the correct type could be returned > # With POIFSContainerDetector the correct type could be returned > # They can also be added to the list of types supported by ExcelParser as it > seems to be able to get some content from them -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira