[GitHub] tika pull request: Fix for TIKA-1882

2016-03-01 Thread mkampasi
GitHub user mkampasi opened a pull request: https://github.com/apache/tika/pull/82 Fix for TIKA-1882 The following mime magic has been added to tika-mimetypes.xml to better detect the below mime-types: 1. **application/vnd.ms-cab-compressed (.cab files)** - pattern "MCSF" i

[jira] [Commented] (TIKA-1882) Updating the tika-mimetypes.xml for new mime magic patterns

2016-03-01 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173509#comment-15173509 ] ASF GitHub Bot commented on TIKA-1882: -- GitHub user mkampasi opened a pull request:

[jira] [Commented] (TIKA-1881) On updating mime magic for existing mime types

2016-03-01 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173524#comment-15173524 ] ASF GitHub Bot commented on TIKA-1881: -- GitHub user NamithaGS opened a pull request:

[GitHub] tika pull request: Fix for TIKA-1881

2016-03-01 Thread NamithaGS
GitHub user NamithaGS opened a pull request: https://github.com/apache/tika/pull/83 Fix for TIKA-1881 Updated Mime-Magic for 6 mime types: 1. application/postscript : files begin with pattern "%!PS-Adobe-3.0 EPSF-3.0". 2. application/wordperfect: files begin with pattern "Ã

[jira] [Commented] (TIKA-1508) Add uniformity to parser parameter configuration

2016-03-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173719#comment-15173719 ] Tim Allison commented on TIKA-1508: --- [~thammegowda], now I remember why I paused on this

[jira] [Commented] (TIKA-1882) Updating the tika-mimetypes.xml for new mime magic patterns

2016-03-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174295#comment-15174295 ] Nick Burch commented on TIKA-1882: -- I'm not sure the quicktime pattern is correct - I have

Re: PDFParser in-process mode

2016-03-01 Thread Pei Chen
Thanks Nick. Just a copy and paste error in the email. I was able to figure out how to bypass the JornalParser and just use PDF ones. --Pei On Wed, 24 Feb 2016, Pei Chen wrote: > Does the default pdf parser using auto detect parser require to tika > to run in server mode? No > It seems to tr

[GitHub] tika pull request: XFA support to PDFParser for TIKA-1857 contribu...

2016-03-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/74 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled

[jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

2016-03-01 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174859#comment-15174859 ] ASF GitHub Bot commented on TIKA-1857: -- Github user asfgit closed the pull request at:

[jira] [Resolved] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

2016-03-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1857. --- Resolution: Fixed [~pascal.essiembre], thank you for this pull request! I made a few modifications, b

[jira] [Updated] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

2016-03-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1857: -- Priority: Major (was: Trivial) > Enhance PDFParser to extract text from XFA forms >

[jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

2016-03-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174937#comment-15174937 ] Hudson commented on TIKA-1857: -- UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See [https://

[jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

2016-03-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174940#comment-15174940 ] Hudson commented on TIKA-1857: -- UNSTABLE: Integrated in tika-2.x #41 (See [https://builds.apa

[GitHub] tika pull request: Integrate NLTK with Tika fix for TIKA-1876 cont...

2016-03-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/80 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled

[jira] [Commented] (TIKA-1876) Integrate Natural Language Toolkit (NLTK) into Tika to perform Named Entity Recognition

2016-03-01 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175077#comment-15175077 ] ASF GitHub Bot commented on TIKA-1876: -- Github user asfgit closed the pull request at:

[jira] [Resolved] (TIKA-1876) Integrate Natural Language Toolkit (NLTK) into Tika to perform Named Entity Recognition

2016-03-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-1876. - Resolution: Fixed thanks [~manalishah...@gmail.com] I integrated this! {noformat} [mattman

[jira] [Commented] (TIKA-1876) Integrate Natural Language Toolkit (NLTK) into Tika to perform Named Entity Recognition

2016-03-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175188#comment-15175188 ] Hudson commented on TIKA-1876: -- UNSTABLE: Integrated in tika-trunk-jdk1.7 #917 (See [https://