Re: Tika 2.0 Source in Modules or tika-parser

2015-12-14 Thread Nick Burch
On Sun, 13 Dec 2015, Bob Paulin wrote: So in short Source in tika-parser Dependencies managed in tika-parser and copied to module Source in Modules Dependencies managed in modules and consolidated via maven shade plugin. Conflicting dependencies managed by maven. IIRC there are some util / p

Re: Tika 2.0 Source in Modules or tika-parser

2015-12-14 Thread Bob Paulin
Answers inline On 12/14/2015 5:24 AM, Nick Burch wrote: On Sun, 13 Dec 2015, Bob Paulin wrote: So in short Source in tika-parser Dependencies managed in tika-parser and copied to module Source in Modules Dependencies managed in modules and consolidated via maven shade plugin. Conflicting dep

[jira] [Commented] (TIKA-1599) Switch from TagSoup to JSoup

2015-12-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056021#comment-15056021 ] Tim Allison commented on TIKA-1599: --- Ha, ok, good to know. Thank you. If only [~chrisma

[jira] [Updated] (TIKA-1799) Upgrade to POI 3.14-Beta1 when available

2015-12-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1799: -- Priority: Blocker (was: Minor) > Upgrade to POI 3.14-Beta1 when available >

[jira] [Comment Edited] (TIKA-1799) Upgrade to POI 3.14-Beta1 when available

2015-12-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053543#comment-15053543 ] Tim Allison edited comment on TIKA-1799 at 12/14/15 2:18 PM: - T

RE: Tika 2.0 Source in Modules or tika-parser

2015-12-14 Thread Allison, Timothy B.
>> example is the org.apache.tika.parser.utils.CommonsDigester. Could classes >> like this be moved into tika-core? Y, I was not happy with the split I did with that, but I wanted to avoid adding a dependency on commons-codec into core. What do others think...another 180k into the core jar?

RE: Tika 2.0 Source in Modules or tika-parser

2015-12-14 Thread Ken Krugler
> From: Bob Paulin > Sent: December 13, 2015 7:34:03pm PST > To: dev@tika.apache.org > Subject: Tika 2.0 Source in Modules or tika-parser > > Hi, > > I've committed the first module break out to the tika 2.0 branch and I'd like > to discuss the possibility of moving the source code from the tik

Re: Tika 2.0 Source in Modules or tika-parser

2015-12-14 Thread Ray Gauss
I'd vote for a tiki-parser-common(s) artifact for common util classes and dependencies. > On Dec 14, 2015, at 10:54 AM, Ken Krugler wrote: > > >> From: Bob Paulin >> Sent: December 13, 2015 7:34:03pm PST >> To: dev@tika.apache.org >> Subject: Tika 2.0 Source in Modules or tika-parser >> >> H

Re: Tika 2.0 Source in Modules or tika-parser

2015-12-14 Thread Nick Burch
On 14/12/15 16:26, Ray Gauss wrote: I'd vote for a tiki-parser-common(s) artifact for common util classes and dependencies. That would make sense to me Nick

Re: Tika 2.0 Source in Modules or tika-parser

2015-12-14 Thread Bob Paulin
So there seems to be a pretty good consensus forming around moving the sources but some differing opinions on where to put shared parser code. tika-parser-commons- Pros: We would be able to keep from adding another dependency to the tika-core project. Cons: All parsers would then require an addit

[jira] [Comment Edited] (TIKA-1799) Upgrade to POI 3.14-Beta1 when available

2015-12-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053543#comment-15053543 ] Tim Allison edited comment on TIKA-1799 at 12/14/15 6:47 PM: - T

[jira] [Updated] (TIKA-1799) Upgrade to POI 3.14-Beta1 when available

2015-12-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1799: -- Priority: Minor (was: Blocker) > Upgrade to POI 3.14-Beta1 when available >

Re: Tika 2.0 Source in Modules or tika-parser

2015-12-14 Thread Nick Burch
On Mon, 14 Dec 2015, Bob Paulin wrote: So there seems to be a pretty good consensus forming around moving the sources but some differing opinions on where to put shared parser code. I know it'll be a bit dull and some work, but... Could someone put together a list (probably in the wiki or on j

[jira] [Created] (TIKA-1812) Tika 2.0 Move Sources to Modules

2015-12-14 Thread Bob Paulin (JIRA)
Bob Paulin created TIKA-1812: Summary: Tika 2.0 Move Sources to Modules Key: TIKA-1812 URL: https://issues.apache.org/jira/browse/TIKA-1812 Project: Tika Issue Type: Improvement Repor

[jira] [Updated] (TIKA-1812) Tika 2.0 Move Sources to Modules

2015-12-14 Thread Bob Paulin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Paulin updated TIKA-1812: - Affects Version/s: 2.0 > Tika 2.0 Move Sources to Modules > > >

[jira] [Commented] (TIKA-1812) Tika 2.0 Move Sources to Modules

2015-12-14 Thread Bob Paulin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056642#comment-15056642 ] Bob Paulin commented on TIKA-1812: -- Output on JDEPs utility in JDK org.apache.tika.par

Re: Tika 2.0 Source in Modules or tika-parser

2015-12-14 Thread Bob Paulin
Created https://issues.apache.org/jira/browse/TIKA-1812 Also included the output from jdep which shows a package by package break down of dependencies. Is org.apache.tika.parser.utils the only shared package or are there others? We can probably move this discussion to the JIRA. - Bob On Mon, D

[jira] [Commented] (TIKA-1812) Tika 2.0 Move Sources to Modules

2015-12-14 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056671#comment-15056671 ] Tim Allison commented on TIKA-1812: --- bq. Is org.apache.tika.parser.utils the only shared