Hi All,

I've got a use case where I need most of the functionality currently available in the AutoDetectParser; ie being able to instantiate the appropriate parser based upon the Stream and MetaData. In case the parser returned is the Html one, our application logic needs to setup a specific ContentHandler to process the SaxContent ourselves. Unfortunately, this prevents us from being able to reuse the AutoDetectTika parser as currently defined. However, a ParserFactory class (which doesn't exist yet) would really help us here and could provide public method(s) to do what's currently done internally by the class AutoDetectParser


One option is to provide something like this:
Parser parser = ParserFactory.getInstance().getParser(stream, metadata);


Another option is the following:
MimeType mt = MimeType.getMimeType(stream, metadata);
Parser parser = ParserFactory.getInstance().getParser(mt);

All the code needed is already there, it's simply just a matter of moving things around and create/initialize a ParserFactory class

If all this makes sense to you guys, please let me know so I can go ahead, submit a ticket, implement this and send a Patch

All the best,

Stephane

Reply via email to