DIH does not get as much attention as other parts of the system. If
you see a clear way to improve it, I'd say go ahead and file the
issue. If you can provide the patch which passes the tests and -
ideally - includes new tests, this would be even greater.

Regards,
   Alex.

----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 4 March 2015 at 11:06, Allison, Timothy B. <talli...@mitre.org> wrote:
> All,
>
>   I recently took a look at the source code for TikaEntityProcessor, and I 
> noticed that the code is not configuring the ParseContext to have Tika's 
> AutoDetectParser (or any parser) parse documents recursively.  That is, if 
> you have a zip file or any other container document, DIH's 
> TikaEntityProcessor is not configured to handle/parse/extract contents from 
> the embedded documents.
>
>   Is this the intended behavior?  Is this what users expect?
>
>   The change is trivial, and it probably should be configurable whether or 
> not to have DIH parse recursively.
>
>   Many apologies if this is a known issue or a non-issue.
>
>    If this is actually an issue, I'll be happy to open an issue and supply a 
> patch.
>
>
>          Best,
>
>                  Tim

Reply via email to