Re: Plans for the first Tika 2.0 release

2016-09-21 Thread Mattmann, Chris A (3980)
NLP/NER is as high a priority to me as the OCR stuff..we have a whole meta framework for doing NER/NLP with NERRecogniser and really cool Tensorflow and other stuff. Hoping 2.0 can help solve this! ☺ ++ Chris Mattmann, Ph.D. Chief Ar

Re: Plans for the first Tika 2.0 release

2016-09-21 Thread Nick Burch
On Mon, 19 Sep 2016, Bob Paulin wrote: I think it's a good thing to discuss. I know there are other features that are targeted for 2.0. Do we have a general sense of where those features are at? I think the big one we need to crack is allowing multiple parsers to run against a file. OCR is

Re: Plans for the first Tika 2.0 release

2016-09-19 Thread Bob Paulin
I think that could work! I've also created a custom filter that might help https://issues.apache.org/jira/browse/TIKA-2083?filter=12338448 Logic is as follows: project = TIKA AND affectedVersion = 2.0 AND priority >= Blocker AND status != Closed AND status != Fixed - Bob On 9/19/2016 1:4

RE: Plans for the first Tika 2.0 release

2016-09-19 Thread Allison, Timothy B.
> Should we create a tika-2_0-blocker label to differentiate from regular > "blockers"? How about a single master issue: TIKA-2085. What else do we need to add?

RE: Plans for the first Tika 2.0 release

2016-09-19 Thread Allison, Timothy B.
>> 1) Implement various strategies for chaining multiple parsers against >> individual files. Much of this has been implemented, but what's holding us >> up on this one (I think?) is a resettable outputstream. >I think we need a JIRA for this. Is there any existing design ideas on how >this wo

Re: Plans for the first Tika 2.0 release

2016-09-19 Thread Bob Paulin
pache Help Wanted page as well. Thanks! Cheers, Tim -Original Message- From: Bob Paulin [mailto:b...@bobpaulin.com] Sent: Monday, September 19, 2016 10:32 AM To: dev@tika.apache.org Subject: Re: Plans for the first Tika 2.0 release Hi, I think it's a good thing to discu

RE: Plans for the first Tika 2.0 release

2016-09-19 Thread Allison, Timothy B.
oment, perhaps after we get 1.14 out, I can turn to 2.0-specific development. What else do we have to do? Anyone else have some time? Cheers, Tim -Original Message- From: Bob Paulin [mailto:b...@bobpaulin.com] Sent: Monday, September 19, 2016 10:32 AM To: dev@tika.apache.org Subject: R

Re: Plans for the first Tika 2.0 release

2016-09-19 Thread Bob Paulin
Hi, I think it's a good thing to discuss. I know there are other features that are targeted for 2.0. Do we have a general sense of where those features are at? My concern is we have been dual maintaining 2 branches for about 9 months. I think the longer we do this the more risk there is t