[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-25 Thread Ingo Renner (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134976#comment-13134976 ] Ingo Renner commented on TIKA-761: -- Got the NPE resolved, it was caused by the changes to

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-25 Thread Ingo Renner (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135106#comment-13135106 ] Ingo Renner commented on TIKA-761: -- Hi Nick and Jukka, some update on the META-INF

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-25 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135129#comment-13135129 ] Jukka Zitting commented on TIKA-761: I'd simply hardcode the properties file path as

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-25 Thread Ingo Renner (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135162#comment-13135162 ] Ingo Renner commented on TIKA-761: -- sure, but the dots will still be replaced with slashes

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-25 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135167#comment-13135167 ] Jukka Zitting commented on TIKA-761: bq. the dots will still be replaced with slashes

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-25 Thread Ingo Renner (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135181#comment-13135181 ] Ingo Renner commented on TIKA-761: -- oh wow, indeed works. Patch coming!

Re: Google's Compact Language Detector

2011-10-25 Thread Robert Muir
On Tue, Oct 25, 2011 at 12:12 PM, Michael McCandless luc...@mikemccandless.com wrote: Tika seems to have a lot of trouble with Spanish (confuses w/ Galician) and Danish (confuses with Dutch). s/Dutch/Norwegian/ -- lucidimagination.com

Re: Google's Compact Language Detector

2011-10-25 Thread Michael McCandless
On Tue, Oct 25, 2011 at 12:32 PM, Robert Muir rcm...@gmail.com wrote: On Tue, Oct 25, 2011 at 12:12 PM, Michael McCandless luc...@mikemccandless.com wrote: Tika seems to have a lot of trouble with Spanish (confuses w/ Galician) and Danish (confuses with Dutch). s/Dutch/Norwegian/ Woops,

Re: Google's Compact Language Detector

2011-10-25 Thread Ken Krugler
On Oct 25, 2011, at 6:12pm, Michael McCandless wrote: OK I posted the 3rd post about CLD, this time testing perf by comparing to Tika and language-detection (Google Code project): http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html Net/net all three do

[jira] [Updated] (TIKA-605) Tika GDAL parser

2011-10-25 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-605: --- Fix Version/s: (was: 1.0) 1.1 - push out to 1.1: prep for 1.0.

[jira] [Updated] (TIKA-754) Automatic line break insertion (BR element) instead of '\n' in XHTMLContentHandler

2011-10-25 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-754: --- Fix Version/s: (was: 1.0) 1.1 - push out to 1.1: prep for 1.0.

[jira] [Updated] (TIKA-757) Address TODOs when we upgrade to next POI release (3.8 beta 5)

2011-10-25 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-757: --- Fix Version/s: (was: 1.0) 1.1 - push out to 1.1: prep for 1.0.

[jira] [Updated] (TIKA-758) Address TODOs when we upgrade to next PDFBox release

2011-10-25 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-758: --- Fix Version/s: (was: 1.0) 1.1 - push out to 1.1: prep for 1.0.

[jira] [Updated] (TIKA-715) Some parsers produce non-well-formed XHTML SAX events

2011-10-25 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-715: --- Fix Version/s: (was: 1.0) 1.1 - push out to 1.1: prep for 1.0.

[jira] [Updated] (TIKA-565) Improved OSGi bundling

2011-10-25 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-565: --- Fix Version/s: (was: 1.0) 1.1 - push out to 1.1: prep for 1.0.

Re: Tika is waiting for ODFToolkit to improve ODF file format processing

2011-10-25 Thread Rob Weir
On Tue, Oct 25, 2011 at 1:03 PM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Oct 24, 2011 at 9:17 AM, Rob Weir robw...@apache.org wrote: On Mon, Oct 24, 2011 at 4:54 AM, Devin Han devin...@apache.org wrote: I saw this issue in Tika: OpenOffice parser: master footer text isn't

Tika 1.0 RC?

2011-10-25 Thread Mattmann, Chris A (388J)
Hey Guys, I created a 1.1 version in JIRA and pushed all open (~13) issues for 1.0 to 1.1. We now have 32 issues resolved in the current 1.0. WDYT? Good enough for a 1.0 release? I'm happy to spin the RC tonight or in the next day (PDT). Any objections? Cheers, Chris