Hi,
On 10/22/07, Keith R. Bennett <[EMAIL PROTECTED]> wrote:
> I was thinking that as a ContentHandler, the user could choose to place all
> the data in memory, and there would be a single copy of the full text.
>
> As the ParserPostProcessor, if I understand correctly, the user is bound to
> consume the extra memory if using the AutoDetectParser, and we are probably
> consuming twice as much memory to do so, since we would be saving the full
> text in two different string writers.
I don't quite follow you. AutoDetectParser never reads the full
content into memory (of course unless an underlying parser does it).
> So I was thinking of moving the existing logic from the ParserPostProcessor
> to a ContentHandler implementation.
Sure, why not.
If I understand you correctly, you'd prefer something like this:
Parser parser = ...;
Metadata metadata = new Metadata();
parser.parse(..., new FullTextContentHandler(metadata), metadata);
over:
Parser parser = new ParserPostProcessor(...);
Metadata metadata = new Metadata();
parser.parse(..., new DefaultHandler(), metadata);
BR,
Jukka Zitting