[ https://issues.apache.org/jira/browse/TIKA-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-131. -------------------------------- Resolution: Fixed Fix Version/s: 0.2-incubating Resolved in revision 638656. > Lazy XHTML prefix generation > ---------------------------- > > Key: TIKA-131 > URL: https://issues.apache.org/jira/browse/TIKA-131 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Priority: Minor > Fix For: 0.2-incubating > > > The XHTMLContentHandler utility class is used by many Tika parsers to > generate XHTML output. Among other things, the XHTMLContentHandler > automatically generates the following XHTML skeleton: > <html xmlns="http://www.w3.org/1999/xhtml"> > <head> > <title>...</title> > </head> > <body> > ... > </body> > </html> > The <title/> tag (and potentially other metadata in future) is based on the > Metadata.TITLE property of the document being parsed. Unfortunately that > metadata is often not yet available when the XHTML generation is started, as > a typical usage pattern is: > XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata); > xhtml.startDocument(); > // parse the document > xhtml.endDocument(); > We can avoid the problem in many cases by postponing the XHTML prefix > generation to when the parser actually starts to produce some SAX events. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.