Hi,
On 9/23/07, kbennett <[EMAIL PROTECTED]> wrote:
> 1) I suggest we create a class to store the parsed document content, rather
> than just a Map. The class could have convenience methods such as
> getStringContent(), and possibly hold onto a resource identifier that could
> be set. We might also want to make the parsed values immutable.
This is what I had in mind for the Metadata instance in my proposed
Parser interface design. I think I have a reasonable evolutionary path
designed for transforming the current Parser interfaces to this
proposed model. Something like this:
current: List<Content> getContents();
TIKA-26: Map<String,Content> getContents();
TIKA-n1: Map<String,Content> parse(InputStream stream);
TIKA-n2: String parse(InputStream stream, Map<String,Content> metadata);
TIKA-n3: String parse(InputStream stream, Metadata metadata);
TIKA-n4: void parse(InputStream stream, ContentHanlder handler,
Metadata metadata);
> 2) If we make the Parser stateless, how will we deal with the chunking of
> large documents?
By making the parse method output SAX events instead of a single
String that contains the text content of the entire document.
BR,
Jukka Zitting