to be quite handy. Ken, Julien
and Chris, please see my replies in line. I've posted them as I received
them within the digest email.
Thanks
Lewis
On Sat, Oct 19, 2013 at 8:31 PM, dev-digest-h...@tika.apache.org wrote:
[DISCUSS] Integrate Apache Any23 into Apache Tika
10047 by: Lewis John
Mcgibbney lewis.mcgibb...@gmail.com
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Friday, October 18, 2013 7:30 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: [DISCUSS] Integrate Apache Any23 into Apache Tika
Hi Tika Dev's/PMC,
This thread is aimed at recognizing common ground shared
Hi Tika Dev's/PMC,
This thread is aimed at recognizing common ground shared by Any23 and Tika
in an attempt to possibly integrate Any23 into Tika.
First however it will serve a purpose for me to put this into context and
also provide some rationale behind this initiative.
It is my understanding
Hi Lewis,
I haven't have much time to look into Any23, which includes reviewing Markus's
patch for integrating some portions of that into Tika (see
https://issues.apache.org/jira/browse/TIKA-980)
The main challenge I see is that Tika seems to do best as a wrapper for other
parsers, versus
Hi,
I had a look at Any23 some time ago and found that it overlapped with quite
a few other projects indeed but could (should?) have either relied on those
projects (e.g. parsing and mimetype stuff to Tika) or delegated the
functionality altogether (e.g. crawling to Nutch) instead of reinventing