[jira] [Commented] (TIKA-2372) OSX DMG support

2017-05-18 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016811#comment-16016811 ] Luis Filipe Nassif commented on TIKA-2372: -- 7zip supports it and dozens of other formats (iso,

Re: Tika App, Extract (-z) and Inline PDF Images?

2017-05-18 Thread Timothy Allison
I think this would be ok if we added a warning that -z is different and a pointer to changing the config? On 2017-05-18 17:02 (-0400), Nick Burch wrote: > Hi All> > > I've just been caught out by the Tika App's -z on a PDF not extracting the > > embedded images. I think we probably shouldn't

Tika App, Extract (-z) and Inline PDF Images?

2017-05-18 Thread Nick Burch
Hi All I've just been caught out by the Tika App's -z on a PDF not extracting the embedded images. I think we probably shouldn't tweak the default config for the other Tika App modes, but what about extract? Any reason why we shouldn't turn on the PDF Parser option "extractInlineImages" when

[jira] [Commented] (TIKA-2372) OSX DMG support

2017-05-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016352#comment-16016352 ] Hudson commented on TIKA-2372: -- UNSTABLE: Integrated in Jenkins build Tika-trunk #1271 (See

[jira] [Commented] (TIKA-2372) OSX DMG support

2017-05-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016324#comment-16016324 ] Nick Burch commented on TIKA-2372: -- For a GPL licensed library, catacombae

[jira] [Created] (TIKA-2372) OSX DMG support

2017-05-18 Thread Nick Burch (JIRA)
Nick Burch created TIKA-2372: Summary: OSX DMG support Key: TIKA-2372 URL: https://issues.apache.org/jira/browse/TIKA-2372 Project: Tika Issue Type: Improvement Components: parser

[jira] [Commented] (TIKA-1334) Add presentation layer for results of each run

2017-05-18 Thread Tom Barber (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016303#comment-16016303 ] Tom Barber commented on TIKA-1334: -- my new guy is looking for an excuse to get started in programming I'll

RE: TikaInputStream parse the content and write to OutputStream

2017-05-18 Thread Allison, Timothy B.
Please DO NOT use Apache Tika for malware scanning. Please use a package that is designed for malware detection. From: Prateek Agarwal [mailto:pra.a...@gmail.com] Sent: Thursday, May 18, 2017 8:17 AM To: Allison, Timothy B. ; dev@tika.apache.org Subject: Re: TikaInputStream

RE: Tika 1.15

2017-05-18 Thread Allison, Timothy B.
+1 Thank you! -Original Message- From: Chris Mattmann [mailto:mattm...@apache.org] Sent: Thursday, May 18, 2017 10:15 AM To: dev@tika.apache.org Subject: Re: Tika 1.15 Hey Tim, I am, Luis is, you are, that’s probably a good enough start. I’ll roll the RC this afternoon, early AM

Re: Tika 1.15

2017-05-18 Thread Chris Mattmann
Hey Tim, I am, Luis is, you are, that’s probably a good enough start. I’ll roll the RC this afternoon, early AM pacific tomorrow! Cheers, Chris On 5/18/17, 3:56 AM, "Allison, Timothy B." wrote: Yes, yes we are...if you and fellow devs are ok with the log message in

[jira] [Commented] (TIKA-2359) Extreme slow parsing on the attachment attached

2017-05-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015808#comment-16015808 ] Chris A. Mattmann commented on TIKA-2359: - totally agree! this is good for 1.15! thanks Tim and

[jira] [Resolved] (TIKA-2368) Clean up SentimentParser dependencies

2017-05-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-2368. - Resolution: Fixed Assignee: Tim Allison Fix Version/s: 1.15 thanks Tim! >

[jira] [Commented] (TIKA-2368) Clean up SentimentParser dependencies

2017-05-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015800#comment-16015800 ] Chris A. Mattmann commented on TIKA-2368: - +1 > Clean up SentimentParser dependencies >

[jira] [Created] (TIKA-2371) Check properties presence - PDFParser

2017-05-18 Thread Julien Massiera (JIRA)
Julien Massiera created TIKA-2371: - Summary: Check properties presence - PDFParser Key: TIKA-2371 URL: https://issues.apache.org/jira/browse/TIKA-2371 Project: Tika Issue Type: Improvement

[commons-text] Regarding code consolidation.

2017-05-18 Thread Rob Tompkins
Hello all, Over the last year or so we in Commons have been working towards a newly released component “commons-text,”, and we were wondering if folks wanted to begin consuming commons-text so that we can consolidate the maintenance of the code performing edit distances and similarity scores (for

[jira] [Commented] (TIKA-2359) Extreme slow parsing on the attachment attached

2017-05-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015636#comment-16015636 ] Hudson commented on TIKA-2359: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1270 (See

[jira] [Commented] (TIKA-2368) Clean up SentimentParser dependencies

2017-05-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015635#comment-16015635 ] Hudson commented on TIKA-2368: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1270 (See

[jira] [Commented] (TIKA-2359) Extreme slow parsing on the attachment attached

2017-05-18 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015601#comment-16015601 ] Luis Filipe Nassif commented on TIKA-2359: -- Hi [~talli...@mitre.org]! I am ok with the message for

[jira] [Commented] (TIKA-2370) Close Font in TrueTypeParser

2017-05-18 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015597#comment-16015597 ] Hudson commented on TIKA-2370: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1269 (See

RE: Tika 1.15

2017-05-18 Thread Allison, Timothy B.
Yes, yes we are...if you and fellow devs are ok with the log message in TIKA-2359. Happy to change that message if there are any concerns/recommendations. Onward! Thank you! Cheers, Tim -Original Message- From: Chris Mattmann [mailto:mattm...@apache.org] Sent: Wednesday,

[jira] [Comment Edited] (TIKA-2359) Extreme slow parsing on the attachment attached

2017-05-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015571#comment-16015571 ] Tim Allison edited comment on TIKA-2359 at 5/18/17 10:55 AM: - I just added

[jira] [Commented] (TIKA-2359) Extreme slow parsing on the attachment attached

2017-05-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015571#comment-16015571 ] Tim Allison commented on TIKA-2359: --- How about: {noformat} LOG.info("Tesseract OCR is

[jira] [Resolved] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2360. --- Resolution: Fixed Sounds like we're in concurrence. Again, [~chrismattmann], apologies for moving

[jira] [Commented] (TIKA-2368) Clean up SentimentParser dependencies

2017-05-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015560#comment-16015560 ] Tim Allison commented on TIKA-2368: --- I added tika-translate to the exclusion list. We'll still get an

[jira] [Updated] (TIKA-2368) Clean up SentimentParser dependencies

2017-05-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2368: -- Priority: Minor (was: Blocker) > Clean up SentimentParser dependencies >

RE: TikaInputStream parse the content and write to OutputStream

2017-05-18 Thread Allison, Timothy B.
While Apache Tika can be used to support forensic analysis/malware detection, it is NOT designed to identify malware. DO NOT rely on Apache Tika to identify malware. I'd recommend using clamav or a commercial antivirus program. If you want to use Tika for another reason (text/metadata

[jira] [Updated] (TIKA-2370) Close Font in TrueTypeParser

2017-05-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2370: -- Description: [~icirellik] opened https://github.com/apache/tika/pull/181 to point out that we're not

[jira] [Updated] (TIKA-2370) Close Font in TrueTypeParser

2017-05-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2370: -- Description: [~icirellik] opened https://github.com/apache/tika/pull/181 to point out that we're not

[jira] [Resolved] (TIKA-2370) Close Font in TrueTypeParser

2017-05-18 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2370. --- Resolution: Fixed Fix Version/s: 1.15 Thank you [~icirellik]! > Close Font in TrueTypeParser >

[jira] [Created] (TIKA-2370) Close Font in TrueTypeParser

2017-05-18 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2370: - Summary: Close Font in TrueTypeParser Key: TIKA-2370 URL: https://issues.apache.org/jira/browse/TIKA-2370 Project: Tika Issue Type: Bug Reporter: Tim