0.8 release: latest status

2010-10-31 Thread Mattmann, Chris A (388J)
Hey Guys, OK, I pushed off all issues in JIRA and have been frantically trying to fix the ones that I was interested in. We're down to 1 issue left, TIKA-462, Ken's issue to push Boilerpipe to Maven Central. Once that goes through, and once I get a little help from Jukka on excluding tests based o

[jira] Updated: (TIKA-503) Add a ContentHandler for collecting links from parser output

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-503: --- Component/s: parser - classify > Add a ContentHandler for collecting links from parser output

[jira] Resolved: (TIKA-531) xmpTPg:NPages creates invalid XML

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-531. Resolution: Not A Problem - I agree with Jukka on this one. The tags shouldn't be invalid acc

[jira] Resolved: (TIKA-503) Add a ContentHandler for collecting links from parser output

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-503. Resolution: Fixed - the current version of 0.8 trunk includes a working version of this Cont

[jira] Updated: (TIKA-539) Encoding detection is too biased by encoding in meta tag

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-539: --- Fix Version/s: (was: 0.8) 0.9 - pushing out to 0.9 -- even though there'

[jira] Updated: (TIKA-533) Mis-detection of zip files as application/vnd.apple.iwork

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-533: --- Fix Version/s: (was: 0.8) 0.9 - pushing out to 0.9 -- there's no patch f

[jira] Updated: (TIKA-525) Mismatched start and end elements in HtmlParser

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-525: --- Fix Version/s: (was: 0.8) 0.9 - pushing out to 0.9 -- there's no patch f

[jira] Updated: (TIKA-390) Missing Header/Footer text for ODT documents

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-390: --- Affects Version/s: 0.8 Fix Version/s: (was: 0.8) 0.9 - pushi

Hudson build is still unstable: Tika-trunk #396

2010-10-31 Thread Apache Hudson Server
See

Hudson build is still unstable: Ti ka-trunk » Apache Tika parsers #396

2010-10-31 Thread Apache Hudson Server
See

[jira] Updated: (TIKA-526) OOXMLParser fails to extract text from within smart tags

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-526: --- Fix Version/s: (was: 0.8) 0.9 - pushing out to 0.9 -- there's no patch f

[jira] Updated: (TIKA-538) Add method get file extension from MimeTypes

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-538: --- Fix Version/s: (was: 0.8) 0.9 - pushing out to 0.9 -- there's no patch f

[jira] Updated: (TIKA-497) HtmlHandler should fix up incorrect capitalization of names in attributes before putting into metadata

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-497: --- Fix Version/s: (was: 0.8) 0.9 - pushing out to 0.9 -- there's no patch f

[jira] Updated: (TIKA-508) HtmlParser link processing should skip usemap and codebase attributes

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-508: --- Fix Version/s: (was: 0.8) 0.9 - pushing out to 0.9 -- there's no patch f

[jira] Updated: (TIKA-524) Unification of HTML output from Office, OOXML and Open Document parsers

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-524: --- Fix Version/s: (was: 0.8) 0.9 - pushing out to 0.9 -- there's no patch f

[jira] Resolved: (TIKA-490) Support for adding language profiles dynamically

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-490. Resolution: Fixed - I applied the latest patch from Jan in r1029556. Ultimately we should mov

Re: Hudson build is still unstable: Tika-trunk #395

2010-10-31 Thread Mattmann, Chris A (388J)
Also, Jukka, if you know how to exclude the tests based on the JDK version, please let me know as I've never done that before. I found the codehaus animal sniffer plugin [1], which looks like it might do the trick, but not sure exactly how... Cheers, Chris [1] http://mojo.codehaus.org/animal-s

Re: Hudson build is still unstable: Tika-trunk #395

2010-10-31 Thread Mattmann, Chris A (388J)
Sigh, I take that back. There is some existing jar file that the build for NetCDF includes that is pre-compiled to Java6. I think that's the class files that are giving us trouble since the output of the netcdf build is a JDK5 jar (generated from Ant with -target set to 1.5). Anyone know of a g

Re: Hudson build is still unstable: Tika-trunk #395

2010-10-31 Thread Mattmann, Chris A (388J)
Hey Jukka, NP, I'll fix this. I can push the Java5 version to replace the Java6 version at Maven Central. One sec, and I'll fix it. Cheers, Chris On 10/31/10 6:16 PM, "Jukka Zitting" wrote: Hi, On Mon, Nov 1, 2010 at 3:03 AM, Apache Hudson Server wrote: > See

Re: Hudson build is still unstable: Tika-trunk #395

2010-10-31 Thread Jukka Zitting
Hi, On Mon, Nov 1, 2010 at 3:03 AM, Apache Hudson Server wrote: > See That's a "java.lang.UnsupportedClassVersionError: Bad version number in .class file" error caused by the new NetCDF jar. It's apparently compiled for Java 6 or higher,

Hudson build is still unstable: Ti ka-trunk » Apache Tika parsers #395

2010-10-31 Thread Apache Hudson Server
See

Hudson build is still unstable: Tika-trunk #395

2010-10-31 Thread Apache Hudson Server
See

[jira] Commented: (TIKA-531) xmpTPg:NPages creates invalid XML

2010-10-31 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926796#action_12926796 ] Jukka Zitting commented on TIKA-531: How is the output invalid XML? The name attribute in

[jira] Commented: (TIKA-536) Updated site layout

2010-10-31 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926794#action_12926794 ] Jukka Zitting commented on TIKA-536: OK, I've committed the changes and the updated layou

[jira] Commented: (TIKA-517) java.io.UnsupportedEncodingException with Russian, Chinese, ... document

2010-10-31 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926793#action_12926793 ] Jukka Zitting commented on TIKA-517: The stack trace suggests that this exception is comi

[jira] Resolved: (TIKA-446) Upgrade to PDFBox 1.3.1

2010-10-31 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-446. Resolution: Fixed Done in revision 1029510. > Upgrade to PDFBox 1.3.1 > --- > >

[jira] Updated: (TIKA-446) Upgrade to PDFBox 1.3.1

2010-10-31 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-446: --- Summary: Upgrade to PDFBox 1.3.1 (was: Upgrade to PDFBox 1.3.0) The 1.3.0 release candidate was cancel

Hudson build is back to normal : Tika -trunk » Apache Tika application #394

2010-10-31 Thread Apache Hudson Server
See

Hudson build is still unstable: Ti ka-trunk » Apache Tika parsers #394

2010-10-31 Thread Apache Hudson Server
See

Hudson build is unstable: Tika-trunk #394

2010-10-31 Thread Apache Hudson Server
See

[jira] Resolved: (TIKA-399) HDF4/5 Tika Parser

2010-10-31 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-399. Resolution: Fixed - thanks to the NetCDF-java library, we can parse HDF4/5 too! Fixed in r102

[jira] Commented: (TIKA-407) Push NetCDF4 lib dependency to Maven Central and Update Tika POM

2010-10-31 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926758#action_12926758 ] Jukka Zitting commented on TIKA-407: Thanks a lot for pushing this through! > Push NetCD

[jira] Commented: (TIKA-462) Add Boilerpipe 1.0.4 to Maven central and remove java.net repository from parser pom

2010-10-31 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926730#action_12926730 ] Ken Krugler commented on TIKA-462: -- I created [https://issues.sonatype.org/browse/OSSRH-950]