[jira] [Created] (TIKA-1876) Integrate Natural Language Toolkit (NLTK) into Tika to perform Named Entity Recognition

2016-02-25 Thread Manali Shah (JIRA)
Manali Shah created TIKA-1876: - Summary: Integrate Natural Language Toolkit (NLTK) into Tika to perform Named Entity Recognition Key: TIKA-1876 URL: https://issues.apache.org/jira/browse/TIKA-1876 Project

[GitHub] tika pull request: Tika 1875

2016-02-25 Thread prasadns14
GitHub user prasadns14 opened a pull request: https://github.com/apache/tika/pull/78 Tika 1875 Updated netcdf mime type magic number File - tika-mimetypes.xml You can merge this pull request into a Git repository by running: $ git pull https://github.com/prasadns14/tika TIK

[jira] [Created] (TIKA-1875) Updating tika-mimetypes.xml to detect .NC files

2016-02-25 Thread Prasad Nagaraj Subramanya (JIRA)
Prasad Nagaraj Subramanya created TIKA-1875: --- Summary: Updating tika-mimetypes.xml to detect .NC files Key: TIKA-1875 URL: https://issues.apache.org/jira/browse/TIKA-1875 Project: Tika

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-25 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168291#comment-15168291 ] Luis Filipe Nassif commented on TIKA-1865: -- Also, what do you think about includin

[jira] [Updated] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

2016-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1857: -- Attachment: govdocs1_xfas.zip 194 xfas from govdocs1 as exported with PDFBox 2.0 (trunk built from within

[jira] [Commented] (TIKA-1855) TIka 2.0 - Move shared test-code back to tika-core and distribute test files to parser modules

2016-02-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168075#comment-15168075 ] Nick Burch commented on TIKA-1855: -- I'm not actually sure we need to do the unzipping thin

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-25 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167988#comment-15167988 ] Luis Filipe Nassif commented on TIKA-1865: -- Hi [~talli...@apache.org]! I think MA

[jira] [Commented] (TIKA-1855) TIka 2.0 - Move shared test-code back to tika-core and distribute test files to parser modules

2016-02-25 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167891#comment-15167891 ] Ken Krugler commented on TIKA-1855: --- The things I don't like about this approach are that

Re: parallel dev on trunk and 2.x?

2016-02-25 Thread Mattmann, Chris A (3980)
+1 I haven’t fully moved over to 2.x yet b/c I haven’t honestly had time to catch up. I suppose after my class in May I will have time to catch up then and I can focus more on 2.x then. But for me I am doing all my work in 1.x now so keeping up to date would be great. +

parallel dev on trunk and 2.x?

2016-02-25 Thread Allison, Timothy B.
All, Do I understand correctly that we should be committing most changes to both trunk and 2.x? Obviously, the 2.x commits are for 2.x. :) Or will merge really, actually, truly work at some point in the future to merge changes in trunk to 2.x? Best, Tim -O

[jira] [Commented] (TIKA-1870) Relocating RichTextContentHandler into tika-core from tika-server

2016-02-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167640#comment-15167640 ] Hudson commented on TIKA-1870: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #915 (See [https://b

[jira] [Commented] (TIKA-1874) Fix rare npe in XWPFListManager

2016-02-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167620#comment-15167620 ] Hudson commented on TIKA-1874: -- SUCCESS: Integrated in tika-2.x #31 (See [https://builds.apac

[jira] [Commented] (TIKA-1870) Relocating RichTextContentHandler into tika-core from tika-server

2016-02-25 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167570#comment-15167570 ] ASF GitHub Bot commented on TIKA-1870: -- Github user asfgit closed the pull request at:

[jira] [Resolved] (TIKA-1870) Relocating RichTextContentHandler into tika-core from tika-server

2016-02-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1870. -- Resolution: Fixed Thanks for preparing patches for all this work. Merged and pushed! > Relocating RichTe

[GitHub] tika pull request: Refector RichTextContentHandler for TIKA-1870 c...

2016-02-25 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/77 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled

[jira] [Commented] (TIKA-1874) Fix rare npe in XWPFListManager

2016-02-25 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167550#comment-15167550 ] Hudson commented on TIKA-1874: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #914 (See [https://b

[jira] [Updated] (TIKA-1874) Fix rare npe in XWPFListManager

2016-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1874: -- Description: Many thanks to [~centic]'s [CommonCrawlDocumentDownload|https://github.com/centic9/CommonCr

[jira] [Resolved] (TIKA-1874) Fix rare npe in XWPFListManager

2016-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1874. --- Resolution: Fixed > Fix rare npe in XWPFListManager > --- > >

[jira] [Created] (TIKA-1874) Fix rare npe in XWPFListManager

2016-02-25 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1874: - Summary: Fix rare npe in XWPFListManager Key: TIKA-1874 URL: https://issues.apache.org/jira/browse/TIKA-1874 Project: Tika Issue Type: Bug Reporter: Ti

[jira] [Commented] (TIKA-1870) Relocating RichTextContentHandler into tika-core from tika-server

2016-02-25 Thread John Patrick (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167296#comment-15167296 ] John Patrick commented on TIKA-1870: Added JavaDoc and Unit Test, although I'm assuming

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167231#comment-15167231 ] Nick Burch commented on TIKA-1865: -- IIRC it needs the "fixed length properties" support to

[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167211#comment-15167211 ] Tim Allison commented on TIKA-1865: --- Good to hear from you, [~lfcnassif]! I've only look

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-02-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167208#comment-15167208 ] Tim Allison commented on TIKA-1607: --- Aside from XMP, I can't think of an example where we

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-02-25 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167135#comment-15167135 ] Ray Gauss II commented on TIKA-1607: I know there can be multiple XMP packets in a sing

[jira] [Commented] (TIKA-1855) TIka 2.0 - Move shared test-code back to tika-core and distribute test files to parser modules

2016-02-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167123#comment-15167123 ] Nick Burch commented on TIKA-1855: -- Currently, we have most test documents in Tika Parsers

[jira] [Commented] (TIKA-1873) Test Cases failed when tika-mimetypes.xml is changed

2016-02-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167012#comment-15167012 ] Nick Burch commented on TIKA-1873: -- Interesting stuff! I'd skip most container-based forma