NLP/NER is as high a priority to me as the OCR stuff..we have a whole meta
framework
for doing NER/NLP with NERRecogniser and really cool Tensorflow and other stuff.
Hoping 2.0 can help solve this! ☺
++
Chris Mattmann, Ph.D.
Chief Ar
On Mon, 19 Sep 2016, Bob Paulin wrote:
I think it's a good thing to discuss. I know there are other features
that are targeted for 2.0. Do we have a general sense of where those
features are at?
I think the big one we need to crack is allowing multiple parsers to run
against a file. OCR is
I think that could work! I've also created a custom filter that might help
https://issues.apache.org/jira/browse/TIKA-2083?filter=12338448
Logic is as follows:
project = TIKA AND affectedVersion = 2.0 AND priority >= Blocker AND
status != Closed AND status != Fixed
- Bob
On 9/19/2016 1:4
> Should we create a tika-2_0-blocker label to differentiate from regular
> "blockers"?
How about a single master issue: TIKA-2085.
What else do we need to add?
>> 1) Implement various strategies for chaining multiple parsers against
>> individual files. Much of this has been implemented, but what's holding us
>> up on this one (I think?) is a resettable outputstream.
>I think we need a JIRA for this. Is there any existing design ideas on how
>this wo
pache Help Wanted page as well.
Thanks!
Cheers,
Tim
-Original Message-
From: Bob Paulin [mailto:b...@bobpaulin.com]
Sent: Monday, September 19, 2016 10:32 AM
To: dev@tika.apache.org
Subject: Re: Plans for the first Tika 2.0 release
Hi,
I think it's a good thing to discu
oment, perhaps after we get 1.14 out, I can
turn to 2.0-specific development.
What else do we have to do? Anyone else have some time?
Cheers,
Tim
-Original Message-
From: Bob Paulin [mailto:b...@bobpaulin.com]
Sent: Monday, September 19, 2016 10:32 AM
To: dev@tika.apache.org
Subject: R
Hi,
I think it's a good thing to discuss. I know there are other features
that are targeted for 2.0. Do we have a general sense of where those
features are at? My concern is we have been dual maintaining 2 branches
for about 9 months. I think the longer we do this the more risk there
is t