Hi,

On Sun, Sep 27, 2009 at 2:59 PM, Ken Krugler
<[email protected]> wrote:
> Though I don't think this would address the fundamental question of how to
> generically extract metadata like the title from compound documents, right?
>
> You'd still have to know something about how the delegate parser embeds this
> information in the actual XHTML output.

Not necessarily, as the delegate parser could well decide to process
the document in some other way (create a separate Lucene index entry,
etc.) than simply reporting the extracted text back to the top-level
parser.

Such use does bend the Parser interface contract, but it does allow
you to do pretty much anything you want with the component documents.

BR,

Jukka Zitting

Reply via email to