Hi,

On 10/19/07, Keith R. Bennett <[EMAIL PROTECTED]> wrote:
> The summary and outlinks information may need the text from multiple SAX
> events, so their implementation may not be trivial, unless we accumulate all
> parsed text in a single string, and then inspect that string (as you did in
> ParserPostProcessor).
>
> Therefore, since fulltext, summary, and outlinks all benefit from all text
> being in a single string, why not create a single implementation of
> ContentHandler that populates all of them?  Then the full text string would
> be in only one place in Tika.

The summary and outLinks implementation based on SAX events may be
more complex but it's still doable, so I'd rather focus on making that
work. The more places we have in Tika that read the content into a
single string, the harder it will be to support really large
documents.

BR,

Jukka Zitting

Reply via email to