Jukka - I was thinking that as a ContentHandler, the user could choose to place all the data in memory, and there would be a single copy of the full text.
As the ParserPostProcessor, if I understand correctly, the user is bound to consume the extra memory if using the AutoDetectParser, and we are probably consuming twice as much memory to do so, since we would be saving the full text in two different string writers. So I was thinking of moving the existing logic from the ParserPostProcessor to a ContentHandler implementation. - Keith Jukka Zitting wrote: > > Hi, > > On 10/22/07, Keith R. Bennett <[EMAIL PROTECTED]> wrote: >> > The summary and outLinks implementation based on SAX events may be >> > more complex but it's still doable, so I'd rather focus on making that >> > work. >> >> That's not something I'd feel confident in implementing correctly, so I >> won't offer to do that. If you'd like me to implement a simpler, >> temporary >> solution, feel free to let me know. > > What's wrong with the current code in ParserPostProcessor? The reason > why I objected to just removing the class is that it already > implements the functionality that you're asking for. > > I'm fine if you want to refactor the class into something else, but I > don't see the logic of first removing it and then implementing the > same functionality from scratch. > > BR, > > Jukka Zitting > > -- View this message in context: http://www.nabble.com/Fulltext-Metadata-Property--tf4643633.html#a13352214 Sent from the Apache Tika - Development mailing list archive at Nabble.com.
