DIH does not get as much attention as other parts of the system. If you see a clear way to improve it, I'd say go ahead and file the issue. If you can provide the patch which passes the tests and - ideally - includes new tests, this would be even greater.
Regards, Alex. ---- Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 4 March 2015 at 11:06, Allison, Timothy B. <talli...@mitre.org> wrote: > All, > > I recently took a look at the source code for TikaEntityProcessor, and I > noticed that the code is not configuring the ParseContext to have Tika's > AutoDetectParser (or any parser) parse documents recursively. That is, if > you have a zip file or any other container document, DIH's > TikaEntityProcessor is not configured to handle/parse/extract contents from > the embedded documents. > > Is this the intended behavior? Is this what users expect? > > The change is trivial, and it probably should be configurable whether or > not to have DIH parse recursively. > > Many apologies if this is a known issue or a non-issue. > > If this is actually an issue, I'll be happy to open an issue and supply a > patch. > > > Best, > > Tim