Hi All,
I've got a use case where I need most of the functionality currently
available in the AutoDetectParser; ie being able to instantiate the
appropriate parser based upon the Stream and MetaData. In case the
parser returned is the Html one, our application logic needs to setup a
specific ContentHandler to process the SaxContent ourselves.
Unfortunately, this prevents us from being able to reuse the
AutoDetectTika parser as currently defined.
However, a ParserFactory class (which doesn't exist yet) would really
help us here and could provide public method(s) to do what's currently
done internally by the class AutoDetectParser
One option is to provide something like this:
Parser parser = ParserFactory.getInstance().getParser(stream, metadata);
Another option is the following:
MimeType mt = MimeType.getMimeType(stream, metadata);
Parser parser = ParserFactory.getInstance().getParser(mt);
All the code needed is already there, it's simply just a matter of
moving things around and create/initialize a ParserFactory class
If all this makes sense to you guys, please let me know so I can go
ahead, submit a ticket, implement this and send a Patch
All the best,
Stephane