[ https://issues.apache.org/jira/browse/TIKA-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Burch resolved TIKA-1353. ------------------------------ Resolution: Fixed Fix Version/s: 1.6 I've fixed those TODOs in r1605124. Now, if a TikaInputStream is given, the ODF file is processed in a random access way, with the metadata handled first. If it's just a regular stream, then the previous "iterate in turn" behaviour continues > OpenDocumentParser doesn't correctly process metadata > ----------------------------------------------------- > > Key: TIKA-1353 > URL: https://issues.apache.org/jira/browse/TIKA-1353 > Project: Tika > Issue Type: Bug > Components: metadata, parser > Affects Versions: 1.5 > Reporter: Steve R > Fix For: 1.6 > > Original Estimate: 24h > Remaining Estimate: 24h > > When using OpenDocumentParser, the metadata isn't set correctly. When using > it to write an html file, the only metadata that it knows about is content > type because it is set ahead of time. > The problem is that when iterating over the zip contents, meta.xml isn't > processed before content.xml. The metadata set on the parse object is correct > after parse() returns, however the contents of the resulting html file is > missing all of the metadata. > Changing the code to be > boolean parsedMetaData = false; > boolean delayLoadContent = false; > while (entry != null) { > ... > } else if (entry.getName().equals("meta.xml")) { > meta.parse(zip, new DefaultHandler(), metadata, context); > parsedMetaData = true; > if (delayLoadContent) { > if (content instanceof OpenDocumentContentParser) { > ((OpenDocumentContentParser) > content).parseInternal(zip, handler, metadata, context); > } else { > // Foreign content parser was set: > content.parse(zip, handler, metadata, context); > } > } > } else if (entry.getName().endsWith("content.xml")) { > if (!parsedMetaData) { > delayLoadContent = true; > } else { > if (content instanceof OpenDocumentContentParser) { > ((OpenDocumentContentParser) > content).parseInternal(zip, handler, metadata, context); > } else { > // Foreign content parser was set: > content.parse(zip, handler, metadata, context); > } > } > } > works as expected. -- This message was sent by Atlassian JIRA (v6.2#6252)