[ 
https://issues.apache.org/jira/browse/TIKA-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-131.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.2-incubating

Resolved in revision 638656.

> Lazy XHTML prefix generation
> ----------------------------
>
>                 Key: TIKA-131
>                 URL: https://issues.apache.org/jira/browse/TIKA-131
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.2-incubating
>
>
> The XHTMLContentHandler utility class is used by many Tika parsers to 
> generate XHTML output. Among other things, the XHTMLContentHandler 
> automatically generates the following XHTML skeleton:
>     <html xmlns="http://www.w3.org/1999/xhtml";>
>       <head>
>         <title>...</title>
>       </head>
>       <body>
>         ...
>       </body>
>     </html>
> The <title/> tag (and potentially other metadata in future) is based on the 
> Metadata.TITLE property of the document being parsed. Unfortunately that 
> metadata is often not yet available when the XHTML generation is started, as 
> a typical usage pattern is:
>     XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
>     xhtml.startDocument();
>     // parse the document
>     xhtml.endDocument();
> We can avoid the problem in many cases by postponing the XHTML prefix 
> generation to when the parser actually starts to produce some SAX events.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to