[jira] [Commented] (TIKA-713) Tika can not parse all of the persian pdf files

2011-10-05 Thread Ahmad Ajiloo (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121376#comment-13121376 ] Ahmad Ajiloo commented on TIKA-713: --- Thanks a lot > Tika can not parse al

[jira] [Resolved] (TIKA-642) Few of RTF files not extracting properly

2011-10-05 Thread Jukka Zitting (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-642. Resolution: Duplicate The example file no longer causes problems with the latest trunk, so I guess t

[jira] [Commented] (TIKA-636) Taking very high heap space while parsing docx - Resulting in OOM in tha app

2011-10-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121157#comment-13121157 ] Jukka Zitting commented on TIKA-636: Do you still see this problem with Tika 0.10? If ye

[jira] [Resolved] (TIKA-744) Drop support for Java 1.4

2011-10-05 Thread Jukka Zitting (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-744. Resolution: Fixed Fix Version/s: 1.0 Assignee: Jukka Zitting Done in revision 1179322

[jira] [Created] (TIKA-744) Drop support for Java 1.4

2011-10-05 Thread Jukka Zitting (Created) (JIRA)
Drop support for Java 1.4 - Key: TIKA-744 URL: https://issues.apache.org/jira/browse/TIKA-744 Project: Tika Issue Type: Improvement Reporter: Jukka Zitting Priority: Minor Since TIKA-175 we'v

[jira] [Commented] (TIKA-605) Tika GDAL parser

2011-10-05 Thread Chris A. Mattmann (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121143#comment-13121143 ] Chris A. Mattmann commented on TIKA-605: Thanks Jukka that really helps!

[jira] [Resolved] (TIKA-699) Automatic checks against backwards-incompatible API changes

2011-10-05 Thread Jukka Zitting (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-699. Resolution: Fixed Fix Version/s: 1.0 Assignee: Jukka Zitting Added checks for tika-co

[jira] [Updated] (TIKA-605) Tika GDAL parser

2011-10-05 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-605: --- Attachment: 0001-TIKA-605-Tika-GDAL-parser.patch I guess ideally we should ask the GDAL toolkit to supp

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2011-10-05 Thread Erik Hetzner (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121064#comment-13121064 ] Erik Hetzner commented on TIKA-741: --- 100 levels should probably do the trick. Thanks!

[jira] [Resolved] (TIKA-730) WriteOutContentHandler concatenates title tag and body text.

2011-10-05 Thread Jukka Zitting (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-730. Resolution: Won't Fix Resolving as Won't Fix since in this case the WriteOutContentHandler class wor

[jira] [Commented] (TIKA-734) Out of memory exception with Xlsx file less than 5 MB

2011-10-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121032#comment-13121032 ] Jukka Zitting commented on TIKA-734: Tika 0.10 is now available. If the problem still oc

[jira] [Resolved] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2011-10-05 Thread Jukka Zitting (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-741. Resolution: Fixed Fix Version/s: 1.0 Assignee: Jukka Zitting In revision 1179254 I in

[jira] [Updated] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2011-10-05 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-741: --- Affects Version/s: (was: 1.0) 0.10 Issue Type: Bug (was: New Feat

[jira] [Updated] (TIKA-740) SAX parser used for HTML

2011-10-05 Thread Jukka Zitting (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-740: --- Attachment: a221657.html I attached a copy of the page served a the referenced URL http://www.almasry-

[jira] [Resolved] (TIKA-739) For certain DWG files, the Tika content parser outputs garbage

2011-10-05 Thread Jukka Zitting (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-739. Resolution: Fixed Fix Version/s: 1.0 Assignee: Jukka Zitting I fixed this in revision

Re: Download-Link to tika-app-0.10.jar doesn't work

2011-10-05 Thread Jukka Zitting
Hi, On Wed, Oct 5, 2011 at 11:06 AM, Bernhard Berger wrote: > the download link to tika-app-0.10.jar (runnable jar) from > http://tika.apache.org/download.html doesn't work (Error 404). Thanks for letting us know! Fixed in revision 1179211. > By the way, how can I log in to the issue tracker to

Jenkins build is back to normal : Tika-trunk » Apache Tika parsers #665

2011-10-05 Thread Apache Jenkins Server
See

Jenkins build is back to normal : Tika-trunk #665

2011-10-05 Thread Apache Jenkins Server
See

[jira] [Resolved] (TIKA-743) Upgrade to Apache parent POM version 10

2011-10-05 Thread Jukka Zitting (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-743. Resolution: Fixed Done in revision 1179209. > Upgrade to Apache parent POM version 1

[jira] [Created] (TIKA-743) Upgrade to Apache parent POM version 10

2011-10-05 Thread Jukka Zitting (Created) (JIRA)
Upgrade to Apache parent POM version 10 --- Key: TIKA-743 URL: https://issues.apache.org/jira/browse/TIKA-743 Project: Tika Issue Type: Improvement Reporter: Jukka Zitting Assignee:

Re: Build failed in Jenkins: Tika-trunk #664

2011-10-05 Thread Michael McCandless
Ugh, my bad: Java 1.6 only code. I'll fix... Mike McCandless http://blog.mikemccandless.com On Wed, Oct 5, 2011 at 7:13 AM, Apache Jenkins Server wrote: > See > > Changes: > > [mikemccand] TIKA-742: extract paragraphs inside PDF pages > >

Re: Build failed in Jenkins: Tika-trunk #664

2011-10-05 Thread Jukka Zitting
Hi, On Wed, Oct 5, 2011 at 1:13 PM, Apache Jenkins Server wrote: > symbol  : constructor IOException(org.xml.sax.SAXException) The IOExceptionWithCause class [1] provides such a constructor for use with Java 5. [1] http://tika.apache.org/0.10/api/org/apache/tika/io/IOExceptionWithCause.html BR

Build failed in Jenkins: Tika-trunk » Apache Tika parsers #664

2011-10-05 Thread Apache Jenkins Server
See Changes: [mikemccand] TIKA-742: extract paragraphs inside PDF pages -- [INFO] [INFO] -

Build failed in Jenkins: Tika-trunk #664

2011-10-05 Thread Apache Jenkins Server
See Changes: [mikemccand] TIKA-742: extract paragraphs inside PDF pages -- [...truncated 92 lines...] [INFO] Setting property: resource.manager.logwhenfound => 'false'. [TASKS] Skipping maven reporter:

[jira] [Resolved] (TIKA-742) PDF2XHTML fails to insert nor space around page marker

2011-10-05 Thread Michael McCandless (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved TIKA-742. - Resolution: Fixed > PDF2XHTML fails to insert nor space around page marker > -

Download-Link to tika-app-0.10.jar doesn't work

2011-10-05 Thread Bernhard Berger
Hello, the download link to tika-app-0.10.jar (runnable jar) from http://tika.apache.org/download.html doesn't work (Error 404). (The source-jar works.) yours, Bernhard By the way, how can I log in to the issue tracker to add a new bug?

[jira] [Resolved] (TIKA-622) Switch from POIFSFileSystem to NPOIFSFileSystem, for speed and memory improvements

2011-10-05 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-622. - Resolution: Fixed Fix Version/s: 0.10 This was fixed back in April in r1091046 > Sw