[jira] [Commented] (TIKA-2107) Old MS Word files give error while indexing

2016-10-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549934#comment-15549934 ] Tim Allison commented on TIKA-2107: --- One recommendation from Twitter was Libre Office commandline

Re: Apache Tika's public regression corpus

2016-10-05 Thread Dominik Stadler
Great writeup, Tim, thanks for taking the time to tell people about things that we do! Dominik. On Wed, Oct 5, 2016 at 7:56 PM, Allison, Timothy B. wrote: > All, > > I recently blogged about some of the work we're doing with a large scale > regression corpus to make Tika,

Apache Tika's public regression corpus

2016-10-05 Thread Allison, Timothy B.
All, I recently blogged about some of the work we're doing with a large scale regression corpus to make Tika, POI and PDFBox more robust and to identify regressions before release. If you'd like to chip in with recommendations, requests or Hadoop/Spark clusters (why not shoot for the stars),

Re: TIKA-1302 blog post

2016-10-05 Thread Mattmann, Chris A (3980)
Tim this is GREAT! Please link it from the wiki that mentions web resource document links. I think: http://wiki.apache.org/tika/TikaResources I fell behind on spinning the release. Will try and make progress today. Chris ++ Chris

[jira] [Commented] (TIKA-2107) Old MS Word files give error while indexing

2016-10-05 Thread Gaurav (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548782#comment-15548782 ] Gaurav commented on TIKA-2107: -- Please suggest a workaround to parse these files. > Old MS Word files give

Re: tika-2.x - Build # 156 - Failure

2016-10-05 Thread Nick Burch
On Wed, 5 Oct 2016, Apache Jenkins Server wrote: The Apache Jenkins build system has built tika-2.x (build #156) Check console output at https://builds.apache.org/job/tika-2.x/156/ to view the results. Another one for our Jenkins experts. Looks like it needs a bit more memory for the job,

tika-2.x - Build # 156 - Failure

2016-10-05 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x (build #156) Status: Failure Check console output at https://builds.apache.org/job/tika-2.x/156/ to view the results.

Re: tika-2.x-windows - Build # 60 - Still Failing

2016-10-05 Thread Nick Burch
On Wed, 5 Oct 2016, Apache Jenkins Server wrote: The Apache Jenkins build system has built tika-2.x-windows (build #60) Check console output at https://builds.apache.org/job/tika-2.x-windows/60/ to view the results. Anyone with Jenkins-foo able to fix our Windows Jenkin builds? This failed

[jira] [Commented] (TIKA-2107) Old MS Word files give error while indexing

2016-10-05 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548427#comment-15548427 ] Nick Burch commented on TIKA-2107: -- The attached file is an old Word 2 file, not supported by POI and

[jira] [Updated] (TIKA-2109) OutOfMemory when parsing 5MB word document

2016-10-05 Thread Julian (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian updated TIKA-2109: - Attachment: zafar-bug-9.docx > OutOfMemory when parsing 5MB word document >

[jira] [Created] (TIKA-2109) OutOfMemory when parsing 5MB word document

2016-10-05 Thread Julian (JIRA)
Julian created TIKA-2109: Summary: OutOfMemory when parsing 5MB word document Key: TIKA-2109 URL: https://issues.apache.org/jira/browse/TIKA-2109 Project: Tika Issue Type: Bug Affects Versions: