Jukka - The ParserPostProcessor creates a TeeHandler that sends output to the caller's handler and, in addition, its own WriteOutContentHandler. So, if I understand correctly, if the caller's handler is also using a WriteOutContentHandler or equivalent, then the full text is being saved in two StringWriter's, no?
Regards, Keith Jukka Zitting wrote: > > Hi, > > On 10/22/07, Keith R. Bennett <[EMAIL PROTECTED]> wrote: >> I was thinking that as a ContentHandler, the user could choose to place >> all >> the data in memory, and there would be a single copy of the full text. >> >> As the ParserPostProcessor, if I understand correctly, the user is bound >> to >> consume the extra memory if using the AutoDetectParser, and we are >> probably >> consuming twice as much memory to do so, since we would be saving the >> full >> text in two different string writers. > > I don't quite follow you. AutoDetectParser never reads the full > content into memory (of course unless an underlying parser does it). > >> So I was thinking of moving the existing logic from the >> ParserPostProcessor >> to a ContentHandler implementation. > > Sure, why not. > > If I understand you correctly, you'd prefer something like this: > > Parser parser = ...; > Metadata metadata = new Metadata(); > parser.parse(..., new FullTextContentHandler(metadata), metadata); > > over: > > Parser parser = new ParserPostProcessor(...); > Metadata metadata = new Metadata(); > parser.parse(..., new DefaultHandler(), metadata); > > BR, > > Jukka Zitting > > -- View this message in context: http://www.nabble.com/Fulltext-Metadata-Property--tf4643633.html#a13352591 Sent from the Apache Tika - Development mailing list archive at Nabble.com.
