Re: [DISCUSS] Integrate Apache Any23 into Apache Tika

2013-11-04 Thread Lewis John Mcgibbney
to be quite handy. Ken, Julien and Chris, please see my replies in line. I've posted them as I received them within the digest email. Thanks Lewis On Sat, Oct 19, 2013 at 8:31 PM, dev-digest-h...@tika.apache.org wrote: [DISCUSS] Integrate Apache Any23 into Apache Tika 10047 by: Lewis John

Re: [DISCUSS] Integrate Apache Any23 into Apache Tika

2013-10-19 Thread Chris Mattmann
Mcgibbney lewis.mcgibb...@gmail.com Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, October 18, 2013 7:30 AM To: dev@tika.apache.org dev@tika.apache.org Subject: [DISCUSS] Integrate Apache Any23 into Apache Tika Hi Tika Dev's/PMC, This thread is aimed at recognizing common ground shared

[DISCUSS] Integrate Apache Any23 into Apache Tika

2013-10-18 Thread Lewis John Mcgibbney
Hi Tika Dev's/PMC, This thread is aimed at recognizing common ground shared by Any23 and Tika in an attempt to possibly integrate Any23 into Tika. First however it will serve a purpose for me to put this into context and also provide some rationale behind this initiative. It is my understanding

Re: [DISCUSS] Integrate Apache Any23 into Apache Tika

2013-10-18 Thread Ken Krugler
Hi Lewis, I haven't have much time to look into Any23, which includes reviewing Markus's patch for integrating some portions of that into Tika (see https://issues.apache.org/jira/browse/TIKA-980) The main challenge I see is that Tika seems to do best as a wrapper for other parsers, versus

Re: [DISCUSS] Integrate Apache Any23 into Apache Tika

2013-10-18 Thread Julien Nioche
Hi, I had a look at Any23 some time ago and found that it overlapped with quite a few other projects indeed but could (should?) have either relied on those projects (e.g. parsing and mimetype stuff to Tika) or delegated the functionality altogether (e.g. crawling to Nutch) instead of reinventing