All,
I recently took a look at the source code for TikaEntityProcessor, and I
noticed that the code is not configuring the ParseContext to have Tika's
AutoDetectParser (or any parser) parse documents recursively. That is, if you
have a zip file or any other container document, DIH's
DIH does not get as much attention as other parts of the system. If
you see a clear way to improve it, I'd say go ahead and file the
issue. If you can provide the patch which passes the tests and -
ideally - includes new tests, this would be even greater.
Regards,
Alex.
Solr Analyzers,
Got it. Thank you. SOLR-7189.
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Wednesday, March 04, 2015 12:31 PM
To: solr-user
Subject: Re: DIH's TikaEntityProcessor's handling of embedded documents
DIH does not get as much attention as other parts