[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504643#comment-15504643 ] Tim Allison commented on TIKA-2069: --- Just realized that we might want to handle extractio

[jira] [Updated] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2084: -- Description: If we want a backoff on exception strategy, "try xmlparser, if that fails, try the TXTParser

[jira] [Updated] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2084: -- Description: If we want a backoff on exception strategy, "try xmlparser, if that fails, try the TXTParser

[jira] [Commented] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504592#comment-15504592 ] Tim Allison commented on TIKA-2084: --- Good point. Thank you. > Create resettable OutputS

Re: Plans for the first Tika 2.0 release

2016-09-19 Thread Bob Paulin
I think that could work! I've also created a custom filter that might help https://issues.apache.org/jira/browse/TIKA-2083?filter=12338448 Logic is as follows: project = TIKA AND affectedVersion = 2.0 AND priority >= Blocker AND status != Closed AND status != Fixed - Bob On 9/19/2016 1:4

[jira] [Commented] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504321#comment-15504321 ] Luis Filipe Nassif commented on TIKA-2084: -- I think the reset could be optional, b

RE: Plans for the first Tika 2.0 release

2016-09-19 Thread Allison, Timothy B.
> Should we create a tika-2_0-blocker label to differentiate from regular > "blockers"? How about a single master issue: TIKA-2085. What else do we need to add?

[jira] [Updated] (TIKA-1509) Create configurable strategies for composite parsers

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1509: -- Issue Type: Sub-task (was: Improvement) Parent: TIKA-2085 > Create configurable strategies for c

[jira] [Updated] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2084: -- Issue Type: New Feature (was: Sub-task) Parent: (was: TIKA-1509) > Create resettable OutputS

[jira] [Updated] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1607: -- Issue Type: Sub-task (was: Improvement) Parent: TIKA-2085 > Introduce new arbitrary object key/v

[jira] [Updated] (TIKA-1974) Tika 2.0 - remove deprecated metadata properties

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1974: -- Issue Type: Sub-task (was: Task) Parent: TIKA-2085 > Tika 2.0 - remove deprecated metadata prope

[jira] [Updated] (TIKA-2083) Tika 2.0 - Audit master branch against 2.x branch

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2083: -- Issue Type: Sub-task (was: Task) Parent: TIKA-2085 > Tika 2.0 - Audit master branch against 2.x

[jira] [Created] (TIKA-2085) Tika 2.0 -- Overarching task list for what we need to do before 2.0

2016-09-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2085: - Summary: Tika 2.0 -- Overarching task list for what we need to do before 2.0 Key: TIKA-2085 URL: https://issues.apache.org/jira/browse/TIKA-2085 Project: Tika Iss

RE: Plans for the first Tika 2.0 release

2016-09-19 Thread Allison, Timothy B.
>> 1) Implement various strategies for chaining multiple parsers against >> individual files. Much of this has been implemented, but what's holding us >> up on this one (I think?) is a resettable outputstream. >I think we need a JIRA for this. Is there any existing design ideas on how >this wo

[jira] [Created] (TIKA-2084) Create resettable OutputStream to support "backoff on exception" strategy

2016-09-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2084: - Summary: Create resettable OutputStream to support "backoff on exception" strategy Key: TIKA-2084 URL: https://issues.apache.org/jira/browse/TIKA-2084 Project: Tika

[jira] [Created] (TIKA-2083) Tika 2.0 - Audit master branch against 2.x branch

2016-09-19 Thread Bob Paulin (JIRA)
Bob Paulin created TIKA-2083: Summary: Tika 2.0 - Audit master branch against 2.x branch Key: TIKA-2083 URL: https://issues.apache.org/jira/browse/TIKA-2083 Project: Tika Issue Type: Task Aff

Re: Plans for the first Tika 2.0 release

2016-09-19 Thread Bob Paulin
Thanks Tim! Replies in line. - Bob On 9/19/2016 12:33 PM, Allison, Timothy B. wrote: Bob, As always, thank you for driving 2.0! My concern is we have been dual maintaining 2 branches for about 9 months. I think the longer we do this the more risk there is that we miss something. Agreed.

RE: Plans for the first Tika 2.0 release

2016-09-19 Thread Allison, Timothy B.
Bob, As always, thank you for driving 2.0! > My concern is we have been dual maintaining 2 branches for about 9 months. I > think the longer we do this the more risk there is that we miss something. Agreed. I think we're already missing a few things. > Would it make sense to at least put

[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2016-09-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504023#comment-15504023 ] Nick Burch commented on TIKA-1997: -- Running your file through the openssl tool {{ asn1pars

[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2016-09-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504023#comment-15504023 ] Nick Burch edited comment on TIKA-1997 at 9/19/16 5:07 PM: --- Runni

Re: Plans for the first Tika 2.0 release

2016-09-19 Thread Bob Paulin
Hi, I think it's a good thing to discuss. I know there are other features that are targeted for 2.0. Do we have a general sense of where those features are at? My concern is we have been dual maintaining 2 branches for about 9 months. I think the longer we do this the more risk there is t

tika-2.x-windows - Build # 48 - Still Failing

2016-09-19 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #48) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/48/ to view the results.

[jira] [Commented] (TIKA-2015) MAPIMessage String fileName constructor leaves file open

2016-09-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503616#comment-15503616 ] Hudson commented on TIKA-2015: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #48 (See

[jira] [Commented] (TIKA-2051) Upgrade to PDFBox 2.0.3 when available

2016-09-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503579#comment-15503579 ] Hudson commented on TIKA-2051: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1102 (See [h

[jira] [Commented] (TIKA-2015) MAPIMessage String fileName constructor leaves file open

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503577#comment-15503577 ] Tim Allison commented on TIKA-2015: --- Doh. Typo in commit message. Should have been TIKA

[jira] [Commented] (TIKA-2015) MAPIMessage String fileName constructor leaves file open

2016-09-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503535#comment-15503535 ] Hudson commented on TIKA-2015: -- SUCCESS: Integrated in Jenkins build tika-2.x #144 (See [http

[jira] [Commented] (TIKA-2082) Upgrade to PDFBox 2.0.3

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503467#comment-15503467 ] Tim Allison commented on TIKA-2082: --- No need to apologize whatsoever. Thank you for the

[jira] [Updated] (TIKA-2045) TIKA crashes / runs out of memory on simple PDF

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2045: -- Fix Version/s: 1.14 2.0 > TIKA crashes / runs out of memory on simple PDF > --

[jira] [Resolved] (TIKA-2045) TIKA crashes / runs out of memory on simple PDF

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2045. --- Resolution: Fixed Upgraded to PDFBox 2.0.3. > TIKA crashes / runs out of memory on simple PDF > --

[jira] [Resolved] (TIKA-2051) Upgrade to PDFBox 2.0.3 when available

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2051. --- Resolution: Fixed Fix Version/s: 1.14 2.0 > Upgrade to PDFBox 2.0.3 when avai

[jira] [Commented] (TIKA-2082) Upgrade to PDFBox 2.0.3

2016-09-19 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503458#comment-15503458 ] Luis Filipe Nassif commented on TIKA-2082: -- Sorry Tim, did not see Tika-2051 > Up

[jira] [Resolved] (TIKA-2082) Upgrade to PDFBox 2.0.3

2016-09-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2082. --- Resolution: Duplicate Fix Version/s: 2.0 Building locally now before I commit (should be 10-15 m

[jira] [Created] (TIKA-2082) Upgrade to PDFBox 2.0.3

2016-09-19 Thread Luis Filipe Nassif (JIRA)
Luis Filipe Nassif created TIKA-2082: Summary: Upgrade to PDFBox 2.0.3 Key: TIKA-2082 URL: https://issues.apache.org/jira/browse/TIKA-2082 Project: Tika Issue Type: Improvement

Plans for the first Tika 2.0 release

2016-09-19 Thread Sergey Beryozkin
Hi All Back in May I updated one of our CXF demos on the master 3.2 branch to depend on Tika 2.0 SNAPSHOT to verify the new module system works well. It is feasible that CXF 3.2.0 may be released by the end of the year or early next year. As far as Tika 2.0 dependencies are concerned it will be