[jira] [Created] (TIKA-1886) Updating tika-mimetypes.xml to detect .hfa files

2016-03-02 Thread Nandan Chandrashekar (JIRA)
Nandan Chandrashekar created TIKA-1886: -- Summary: Updating tika-mimetypes.xml to detect .hfa files Key: TIKA-1886 URL: https://issues.apache.org/jira/browse/TIKA-1886 Project: Tika Issue

Re: Need suggestion on file type .HFA to be added Tika.

2016-03-02 Thread Nandan Padar Chandrashekar
Thanks Nick and Prof Chris. Will update tika-mimetypes.xml for the same. Regards Nandan On Wed, Mar 2, 2016 at 8:32 PM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > I agree with Nick’s replies here > > > > ++

[jira] [Resolved] (TIKA-1885) Updated tika-mimestype.xml and a detector to identify new types of files based on analysis

2016-03-02 Thread Adesh Gupta (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adesh Gupta resolved TIKA-1885. --- Resolution: Fixed Added a custom detector and an updates tika-mimetypes.xml file > Updated tika-mimest

Re: Need suggestion on file type .HFA to be added Tika.

2016-03-02 Thread Mattmann, Chris A (3980)
I agree with Nick’s replies here ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris

Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

2016-03-02 Thread Mattmann, Chris A (3980)
yeah maybe you’re right thanks for fixing it guys ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168

[jira] [Commented] (TIKA-1782) XHTMLContentHandler doesn't pass attributes of html element

2016-03-02 Thread James Sullivan (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177104#comment-15177104 ] James Sullivan commented on TIKA-1782: -- Is this related http://stackoverflow.com/ques

[jira] [Commented] (TIKA-1882) Updating the tika-mimetypes.xml for new mime magic patterns

2016-03-02 Thread Manisha Kampasi (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177076#comment-15177076 ] Manisha Kampasi commented on TIKA-1882: --- Hi Nick, I based my analysis on the followi

[jira] [Created] (TIKA-1885) Updated tika-mimestype.xml and a detector to identify new types of files based on analysis

2016-03-02 Thread Adesh Gupta (JIRA)
Adesh Gupta created TIKA-1885: - Summary: Updated tika-mimestype.xml and a detector to identify new types of files based on analysis Key: TIKA-1885 URL: https://issues.apache.org/jira/browse/TIKA-1885 Proj

Re: Need suggestion on file type .HFA to be added Tika.

2016-03-02 Thread Nick Burch
On Wed, 2 Mar 2016, Nandan Padar Chandrashekar wrote: Identified (Hierarchical File Architecture) HFA file format which is not presently being identified through Tika. extension : *.hfa Header tag contains string EHFA_HEADER_TAG Looks fine for adding to Tika to me Should this be considered

[jira] [Commented] (TIKA-1663) Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata

2016-03-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176652#comment-15176652 ] Nick Burch commented on TIKA-1663: -- The other parser decorators are specified with options

[jira] [Updated] (TIKA-1866) Out of memory error on Word document

2016-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1866: -- Priority: Major (was: Minor) > Out of memory error on Word document > --

[jira] [Updated] (TIKA-1866) Out of memory error on Word document

2016-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1866: -- Attachment: U77VVDMDHSQ6M2CLZH3AM2IEZOIUEJWI.pptx May have found a similar problem with pptx. {noformat}

[jira] [Issue Comment Deleted] (TIKA-1883) Identification of Mime Type for Empty Files

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Ramachandra Desai updated TIKA-1883: --- Comment: was deleted (was: The updated codes is available at https://github.co

[jira] [Issue Comment Deleted] (TIKA-1884) Updating Tika Mime Repository

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Ramachandra Desai updated TIKA-1884: --- Comment: was deleted (was: The updated codes is available at https://github.co

[jira] [Resolved] (TIKA-1884) Updating Tika Mime Repository

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Ramachandra Desai resolved TIKA-1884. Resolution: Fixed The updated codes is available at https://github.com/Rashm

[jira] [Resolved] (TIKA-1883) Identification of Mime Type for Empty Files

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Ramachandra Desai resolved TIKA-1883. Resolution: Fixed The updated codes is available at https://github.com/Rashm

[jira] [Commented] (TIKA-1884) Updating Tika Mime Repository

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176178#comment-15176178 ] Aditya Ramachandra Desai commented on TIKA-1884: The updated codes is avail

[jira] [Commented] (TIKA-1883) Identification of Mime Type for Empty Files

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176175#comment-15176175 ] Aditya Ramachandra Desai commented on TIKA-1883: The updated codes is avail

tika-2.x - Build # 44 - Still Failing

2016-03-02 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x (build #44) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x/44/ to view the results.

[jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

2016-03-02 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175818#comment-15175818 ] Hudson commented on TIKA-1857: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #919 (See [https://b

[jira] [Resolved] (TIKA-1816) Lenient testing for NamedEntityParser

2016-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1816. --- Resolution: Fixed Works locally, at least. Thank you! > Lenient testing for NamedEntityParser > -

[jira] [Commented] (TIKA-1816) Lenient testing for NamedEntityParser

2016-03-02 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175785#comment-15175785 ] ASF GitHub Bot commented on TIKA-1816: -- Github user asfgit closed the pull request at:

[GitHub] tika pull request: TIKA1816 : NER model download via maven proxy (...

2016-03-02 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/84 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2016-03-02 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175769#comment-15175769 ] Hudson commented on TIKA-1657: -- FAILURE: Integrated in tika-2.x #43 (See [https://builds.apac

tika-2.x - Build # 43 - Failure

2016-03-02 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x (build #43) Status: Failure Check console output at https://builds.apache.org/job/tika-2.x/43/ to view the results.

[jira] [Resolved] (TIKA-1657) Allow easier XML serialization of TikaConfig

2016-03-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1657. --- Resolution: Fixed I moved Nick's code from tika-example to tika-core, and I made it available via tika

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2016-03-02 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175716#comment-15175716 ] Hudson commented on TIKA-1657: -- UNSTABLE: Integrated in tika-trunk-jdk1.7 #918 (See [https://

[jira] [Commented] (TIKA-1860) Tika 2.0 - Create Module OSGi implementations to replace tika-bundle

2016-03-02 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175680#comment-15175680 ] Hudson commented on TIKA-1860: -- UNSTABLE: Integrated in tika-2.x #42 (See [https://builds.apa

RE: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

2016-03-02 Thread Allison, Timothy B.
There's a chance you hadn't merged my breaking commit? -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Wednesday, March 02, 2016 9:27 AM To: dev@tika.apache.org Subject: Re: trunk build failing in bundle --, cxf class not found for GrobidRES

Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

2016-03-02 Thread Mattmann, Chris A (3980)
wow this is super odd. Last thing I committed was NLTK .. and it built fine locally I Tested before committing. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion La

RE: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

2016-03-02 Thread Allison, Timothy B.
Those lines were added 3.5 years ago: http://svn.apache.org/viewvc?view=revision&revision=1369624 -Original Message- From: Bob Paulin [mailto:b...@bobpaulin.com] Sent: Wednesday, March 02, 2016 8:47 AM To: dev@tika.apache.org Subject: Re: trunk build failing in bundle --, cxf class not f

RE: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

2016-03-02 Thread Allison, Timothy B.
So it was my fault...argh...unintended consequences... Thank you! -Original Message- From: Bob Paulin [mailto:b...@bobpaulin.com] Sent: Wednesday, March 02, 2016 8:47 AM To: dev@tika.apache.org Subject: Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser? I s

Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

2016-03-02 Thread Bob Paulin
Also as a follow up... .This means that the JournalParser would have never worked in tika-bundle since the org.apache.cxf.jaxrs.ext.multipart package is required for the GrobidRESTParser to run. Is there a reason this was not included? I'm guessing cxf-rt-rs-client dependancy maybe caused prob

Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

2016-03-02 Thread Bob Paulin
I saw it on the 2.x branch but now that you mention it's also happening in trunk I think I see the issue. The change to the PDFParser includes adding dependencies in the javax.xml.stream package. The tika-bundle currently has that package marked optional: javax.xml.stream;version="[1.0,2)";r

trunk build failing in bundle --, cxf class not found for GrobidRESTParser?

2016-03-02 Thread Allison, Timothy B.
Anyone have an idea why trunk is now failing? I couldn't find any changes between the last successful build and last night's failures that would explain this. Test set: org.apache.tika.bundle.BundleIT --- Tests run: 9,

[jira] [Commented] (TIKA-1816) Lenient testing for NamedEntityParser

2016-03-02 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175318#comment-15175318 ] ASF GitHub Bot commented on TIKA-1816: -- GitHub user thammegowda opened a pull request:

[GitHub] tika pull request: TIKA1816 : NER model download via maven proxy (...

2016-03-02 Thread thammegowda
GitHub user thammegowda opened a pull request: https://github.com/apache/tika/pull/84 TIKA1816 : NER model download via maven proxy ( from 1.x to 2.x) This PR brings proxy based downloading feature from 1.x branch to 2.x Closes TIKA-1816 You can merge this pull request in

Need suggestion on file type .HFA to be added Tika.

2016-03-02 Thread Nandan Padar Chandrashekar
Hi All, Identified (Hierarchical File Architecture) HFA file format which is not presently being identified through Tika. file format details : extension : *.hfa Header tag contains string EHFA_HEADER_TAG Links : 1. ftp://ftp.ecn.purdue.edu/jshan/86/help/html/appendices/hfa_object_directory.h

[jira] [Created] (TIKA-1884) Updating Tika Mime Repository

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
Aditya Ramachandra Desai created TIKA-1884: -- Summary: Updating Tika Mime Repository Key: TIKA-1884 URL: https://issues.apache.org/jira/browse/TIKA-1884 Project: Tika Issue Type: Impr

[jira] [Updated] (TIKA-1883) Identification of Mime Type for Empty Files

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Ramachandra Desai updated TIKA-1883: --- Description: Identification of Mime types for empty files, updating TIKA 1.12 s

[jira] [Created] (TIKA-1883) Identification of Mime Type for Empty Files

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
Aditya Ramachandra Desai created TIKA-1883: -- Summary: Identification of Mime Type for Empty Files Key: TIKA-1883 URL: https://issues.apache.org/jira/browse/TIKA-1883 Project: Tika Is