[DISCUSS] Integrate Apache Any23 into Apache Tika

2013-10-18 Thread Lewis John Mcgibbney
Hi Tika Dev's/PMC, This thread is aimed at recognizing common ground shared by Any23 and Tika in an attempt to possibly integrate Any23 into Tika. First however it will serve a purpose for me to put this into context and also provide some rationale behind this initiative. It is my understanding

Re: [DISCUSS] Integrate Apache Any23 into Apache Tika

2013-10-18 Thread Ken Krugler
Hi Lewis, I haven't have much time to look into Any23, which includes reviewing Markus's patch for integrating some portions of that into Tika (see https://issues.apache.org/jira/browse/TIKA-980) The main challenge I see is that Tika seems to do best as a wrapper for other parsers, versus

Re: [DISCUSS] Integrate Apache Any23 into Apache Tika

2013-10-18 Thread Julien Nioche
Hi, I had a look at Any23 some time ago and found that it overlapped with quite a few other projects indeed but could (should?) have either relied on those projects (e.g. parsing and mimetype stuff to Tika) or delegated the functionality altogether (e.g. crawling to Nutch) instead of reinventing