[ https://issues.apache.org/jira/browse/TIKA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated TIKA-989: ------------------------------------ Attachment: TIKA-989.patch New patch ... I think it's ready. Instead of hardwiring the relationship ID into the suggested embedded RESOURCE_NAME, I created a new TikaMetadataKeys.EMBEDDED_RELATIONSHIP_ID which I set in the Metadata. And I fixed TikaCLI -z to prefix the filename it writes each embedded file to, with the relationship ID. > We don't extract a placeholder for documents embedded in a Word OOXML (.docx) > document > -------------------------------------------------------------------------------------- > > Key: TIKA-989 > URL: https://issues.apache.org/jira/browse/TIKA-989 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 1.3 > > Attachments: TIKA-989.patch, TIKA-989.patch > > > In TIKA-956 we fixed the Word parser so that at the point where an embedded > document appears, we output a <div class="embedded" id="_XXX"/> tag. > It would be nice to do this for documents embedded in OOXML documents too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira