[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-06-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018448#comment-14018448 ] Chris A. Mattmann commented on TIKA-1302: - +1 this sounds good to me, Tim. > Let's

[jira] [Commented] (TIKA-1311) Centralize JSON handling of Metadata

2014-06-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018432#comment-14018432 ] Hudson commented on TIKA-1311: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #19 (See [https://bu

[jira] [Commented] (TIKA-1311) Centralize JSON handling of Metadata

2014-06-04 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018421#comment-14018421 ] Hudson commented on TIKA-1311: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #19 (See [https://bu

[jira] [Created] (TIKA-1323) Improve logging in JAX-RS server

2014-06-04 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1323: - Summary: Improve logging in JAX-RS server Key: TIKA-1323 URL: https://issues.apache.org/jira/browse/TIKA-1323 Project: Tika Issue Type: Improvement Rep

[jira] [Closed] (TIKA-1311) Centralize JSON handling of Metadata

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison closed TIKA-1311. - Resolution: Fixed Fix Version/s: 1.6 Nick, Sergey and Chris, thank you, all, for your feedback. Se

Re: Review Request 22246: New parser for Matlab .mat files

2014-06-04 Thread Chris Mattmann
> On June 4, 2014, 11:25 p.m., Matthias Krueger wrote: > > The Matlab MIME types used seem to be application/x-matlab-data or > > application/matlab-mat. > > > > Would it make sense to add them to the mime XML for detection? > > > > > > MATLAB data file > > > > > > > > > >

Re: Review Request 22246: New parser for Matlab .mat files

2014-06-04 Thread Matthias Krueger
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22246/#review44773 --- The Matlab MIME types used seem to be application/x-matlab-data or

Re: Review Request 22246: New parser for Matlab .mat files

2014-06-04 Thread Ann Burgess
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22246/ --- (Updated June 4, 2014, 10:23 p.m.) Review request for tika and Chris Mattmann.

Review Request 22246: New parser for Matlab .mat files

2014-06-04 Thread Ann Burgess
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22246/ --- Review request for tika and Chris Mattmann. Repository: tika Description

[jira] [Commented] (TIKA-1322) XML file parse errors within archives trigger Zip bomb detection

2014-06-04 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018247#comment-14018247 ] ASF GitHub Bot commented on TIKA-1322: -- GitHub user mkr opened a pull request: ht

[GitHub] tika pull request: TIKA-1322: Properly close XMLParser's output in...

2014-06-04 Thread mkr
GitHub user mkr opened a pull request: https://github.com/apache/tika/pull/9 TIKA-1322: Properly close XMLParser's output in case of SAXException. Fix and test for https://issues.apache.org/jira/browse/TIKA-1322. You can merge this pull request into a Git repository by running:

[jira] [Created] (TIKA-1322) XML file parse errors within archives trigger Zip bomb detection

2014-06-04 Thread Matthias Krueger (JIRA)
Matthias Krueger created TIKA-1322: -- Summary: XML file parse errors within archives trigger Zip bomb detection Key: TIKA-1322 URL: https://issues.apache.org/jira/browse/TIKA-1322 Project: Tika

Re: Review Request 22219: Add Translation to Tika

2014-06-04 Thread Chris Mattmann
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22219/#review44758 --- trunk/tika-core/src/main/java/org/apache/tika/Tika.java

[jira] [Commented] (TIKA-1319) Translation

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018175#comment-14018175 ] Tim Allison commented on TIKA-1319: --- +1 Perfect. Thank you, [~chrismattmann]! And, of c

[jira] [Commented] (TIKA-1319) Translation

2014-06-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018158#comment-14018158 ] Chris A. Mattmann commented on TIKA-1319: - Hmm...I'm not so convinced this should b

[jira] [Commented] (TIKA-1319) Translation

2014-06-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018152#comment-14018152 ] Lewis John McGibbney commented on TIKA-1319: [~talli...@mitre.org] .bq I worry

[jira] [Commented] (TIKA-1319) Translation

2014-06-04 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018134#comment-14018134 ] Tyler Palsulich commented on TIKA-1319: --- > separate compilation unit, say, tika-trans

[jira] [Commented] (TIKA-1319) Translation

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018100#comment-14018100 ] Tim Allison commented on TIKA-1319: --- [~tpalsulich], thank you very much for wiring this t

[jira] [Commented] (TIKA-1232) Add PDF version to PDFParser output

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018077#comment-14018077 ] Tim Allison commented on TIKA-1232: --- [~tpalsulich], the PDFBox fix should be in v 1.8.6.

Re: Review Request 22219: Add Translation to Tika

2014-06-04 Thread Tyler Palsulich
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22219/ --- (Updated June 4, 2014, 7:11 p.m.) Review request for tika and Chris Mattmann.

Re: Review Request 22219: Add Translation to Tika

2014-06-04 Thread Tyler Palsulich
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22219/ --- (Updated June 4, 2014, 7:17 p.m.) Review request for tika and Chris Mattmann.

[jira] [Commented] (TIKA-1319) Translation

2014-06-04 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018034#comment-14018034 ] Tyler Palsulich commented on TIKA-1319: --- Thanks, Chris and Paul. I just updated the p

Re: Review Request 22219: Add Translation to Tika

2014-06-04 Thread Chris Mattmann
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22219/#review44732 --- Tyler this is a great first start! I think if you turn Translator in

Re: Review Request 22219: Add Translation to Tika

2014-06-04 Thread Chris Mattmann
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22219/#review44731 --- trunk/tika-core/src/main/java/org/apache/tika/Tika.java

Re: unit test error for new parser

2014-06-04 Thread Mattmann, Chris A (3980)
Great work Tyler - can you and Annie make sure there is a JIRA issue for this and maybe throw your patch up on Review Board and I would be happy to help shepherd it into the sources. ++ Chris Mattmann, Ph.D. Chief Architect Instrument

[jira] [Commented] (TIKA-1311) Centralize JSON handling of Metadata

2014-06-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017963#comment-14017963 ] Chris A. Mattmann commented on TIKA-1311: - Tim, patch looks good. I reviewed the pa

Re: unit test error for new parser

2014-06-04 Thread Mattmann, Chris A (3980)
Awesome job you two. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@

[jira] [Commented] (TIKA-1319) Translation

2014-06-04 Thread Paul Ramirez (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017953#comment-14017953 ] Paul Ramirez commented on TIKA-1319: Looks good. Couple of questions came to mind. Woul

[jira] [Commented] (TIKA-1319) Translation

2014-06-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017952#comment-14017952 ] Chris A. Mattmann commented on TIKA-1319: - Tyler this is great! I think we should t

[jira] [Assigned] (TIKA-1319) Translation

2014-06-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned TIKA-1319: --- Assignee: Chris A. Mattmann > Translation > --- > > Key: TIKA-

Review Request 22219: Add Translation to Tika

2014-06-04 Thread Tyler Palsulich
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22219/ --- Review request for tika and Chris Mattmann. Repository: tika Description

Re: unit test error for new parser

2014-06-04 Thread Tyler Palsulich
Hi Annie, I put together a patch to work your parser into Tika under org.apache.tika.parsers.mat. You'll need to put a valid matlab file (I called it MatlabFile.m, but just update the test) under test-documents for the parsing in the test to work properly -- the file I had failed because of "om.jma

[jira] [Commented] (TIKA-1232) Add PDF version to PDFParser output

2014-06-04 Thread Johan van der Knijff (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017879#comment-14017879 ] Johan van der Knijff commented on TIKA-1232: I'm currently away and unable to r

[jira] [Updated] (TIKA-1232) Add PDF version to PDFParser output

2014-06-04 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1232: -- Attachment: testComment.pdf I just tried to remove the testMetadataEquality workaround for [PDF

[jira] [Commented] (TIKA-1319) Translation

2014-06-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017830#comment-14017830 ] Lewis John McGibbney commented on TIKA-1319: I think this is a nice (and pretty

[jira] [Updated] (TIKA-1212) Recursive Extraction of Archive File

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1212: -- Attachment: RECURSIVE_PARSER_WRAPPER_HACK.patch This is nowhere near ready to go, but this shows the sol

[jira] [Updated] (TIKA-1212) Recursive Extraction of Archive File

2014-06-04 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-1212: - Attachment: RecursiveParsingExample.java This is why we should have our examples pulled from svn, where we

[jira] [Comment Edited] (TIKA-1212) Recursive Extraction of Archive File

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017744#comment-14017744 ] Tim Allison edited comment on TIKA-1212 at 6/4/14 2:58 PM: --- Great

[jira] [Commented] (TIKA-1212) Recursive Extraction of Archive File

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017744#comment-14017744 ] Tim Allison commented on TIKA-1212: --- Great. Thank you! > Recursive Extraction of Archiv

[jira] [Commented] (TIKA-1212) Recursive Extraction of Archive File

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017723#comment-14017723 ] Tim Allison commented on TIKA-1212: --- The only solution that I could find was to use a tra

[jira] [Comment Edited] (TIKA-1212) Recursive Extraction of Archive File

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017721#comment-14017721 ] Tim Allison edited comment on TIKA-1212 at 6/4/14 2:26 PM: --- [~gag

[jira] [Updated] (TIKA-1212) Recursive Extraction of Archive File

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1212: -- Attachment: test_recursive_embedded.docx [~gagravarr], I'm not sure that the example code works with the

[jira] [Closed] (TIKA-1320) extract text from jpeg in solr tika

2014-06-04 Thread muruganv (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] muruganv closed TIKA-1320. -- Resolution: Fixed > extract text from jpeg in solr tika > --- > >

[jira] [Commented] (TIKA-1318) Use of Deprecated Word6Extractor.getParagraphText() Method

2014-06-04 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017705#comment-14017705 ] Nick Burch commented on TIKA-1318: -- HWPFOldDocument (the one for word 6) extends HWPFDocum

[jira] [Comment Edited] (TIKA-1311) Centralize JSON handling of Metadata

2014-06-04 Thread Sergey Beryozkin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017687#comment-14017687 ] Sergey Beryozkin edited comment on TIKA-1311 at 6/4/14 1:29 PM: -

[jira] [Commented] (TIKA-1311) Centralize JSON handling of Metadata

2014-06-04 Thread Sergey Beryozkin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017687#comment-14017687 ] Sergey Beryozkin commented on TIKA-1311: Hi, yes, adding MBR would lead to a simple

[jira] [Commented] (TIKA-1311) Centralize JSON handling of Metadata

2014-06-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017655#comment-14017655 ] Tim Allison commented on TIKA-1311: --- [~gagravarr], thank you, as always, for your thought

[jira] [Created] (TIKA-1321) Add experimental Stax/Streaming XWPF/docx extractor

2014-06-04 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1321: - Summary: Add experimental Stax/Streaming XWPF/docx extractor Key: TIKA-1321 URL: https://issues.apache.org/jira/browse/TIKA-1321 Project: Tika Issue Type: New Feat

[jira] [Commented] (TIKA-1320) extract text from jpeg in solr tika

2014-06-04 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017613#comment-14017613 ] Ray Gauss II commented on TIKA-1320: I'm not sure we have enough context in the descrip

Re: unit test error for new parser

2014-06-04 Thread Matthias Krueger
Hi Annie, [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/annbryant/TIKA/tika/tika-parsers/src/main/java/org/apache/tika/parser/mat/MatParser.java:[69,23] cannot

[jira] [Commented] (TIKA-1320) extract text from jpeg in solr tika

2014-06-04 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017473#comment-14017473 ] Hong-Thai Nguyen commented on TIKA-1320: OCR is a solution: TIKA-93. Unfortunately,