Hi, On Sun, Sep 27, 2009 at 2:59 PM, Ken Krugler <[email protected]> wrote: > Though I don't think this would address the fundamental question of how to > generically extract metadata like the title from compound documents, right? > > You'd still have to know something about how the delegate parser embeds this > information in the actual XHTML output.
Not necessarily, as the delegate parser could well decide to process the document in some other way (create a separate Lucene index entry, etc.) than simply reporting the extracted text back to the top-level parser. Such use does bend the Parser interface contract, but it does allow you to do pretty much anything you want with the component documents. BR, Jukka Zitting
