Re: Multiple documents per input stream

Jukka Zitting Sun, 27 Sep 2009 06:18:59 -0700

Hi,

On Sun, Sep 27, 2009 at 2:59 PM, Ken Krugler
<[email protected]> wrote:
> Though I don't think this would address the fundamental question of how to
> generically extract metadata like the title from compound documents, right?
>
> You'd still have to know something about how the delegate parser embeds this
> information in the actual XHTML output.


Not necessarily, as the delegate parser could well decide to process
the document in some other way (create a separate Lucene index entry,
etc.) than simply reporting the extracted text back to the top-level
parser.

Such use does bend the Parser interface contract, but it does allow
you to do pretty much anything you want with the component documents.

BR,

Jukka Zitting

Re: Multiple documents per input stream

Reply via email to