DIH's TikaEntityProcessor's handling of embedded documents

2015-03-04 Thread Allison, Timothy B.
All, I recently took a look at the source code for TikaEntityProcessor, and I noticed that the code is not configuring the ParseContext to have Tika's AutoDetectParser (or any parser) parse documents recursively. That is, if you have a zip file or any other container document, DIH's

Re: DIH's TikaEntityProcessor's handling of embedded documents

2015-03-04 Thread Alexandre Rafalovitch
DIH does not get as much attention as other parts of the system. If you see a clear way to improve it, I'd say go ahead and file the issue. If you can provide the patch which passes the tests and - ideally - includes new tests, this would be even greater. Regards, Alex. Solr Analyzers,

RE: DIH's TikaEntityProcessor's handling of embedded documents

2015-03-04 Thread Allison, Timothy B.
Got it. Thank you. SOLR-7189. -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Wednesday, March 04, 2015 12:31 PM To: solr-user Subject: Re: DIH's TikaEntityProcessor's handling of embedded documents DIH does not get as much attention as other parts