[jira] Commented: (TIKA-539) Encoding detection is too biased by encoding in meta tag

2010-11-06 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928978#action_12928978 ] Benson Margulies commented on TIKA-539: --- Have you checked the insides of nutch for usef

[jira] Commented: (TIKA-539) Encoding detection is too biased by encoding in meta tag

2010-11-06 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929149#action_12929149 ] Ken Krugler commented on TIKA-539: -- Hi Benson, Most of the Tika code came from Nutch origin

Re: Charset SPI

2010-11-06 Thread Ken Krugler
On Nov 4, 2010, at 7:08am, Benson Margulies wrote: Have you all ever considered wiring the CharsetDetector to the java.nio.Charset SPI mechanism as an autodetecting charset? No, I don't remember this coming up. Can you provide any additional information about costs and benefits? Thanks, --

Re: Charset SPI

2010-11-06 Thread Benson Margulies
It provides a tiny convenience. It allows people to use Charset.forName("tikaDetector") and then use the results to apply the detector to any of the APIs that accept a Charset object. I think it's nearly cost-free; it requires a class and an SPI text file. On Sat, Nov 6, 2010 at 3:19 PM, Ken Krugl

My ApacheConNA 2010 slides

2010-11-06 Thread Mattmann, Chris A (388J)
are now posted online at Slideshare.net: http://s.apache.org/2ak Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: c

[jira] Updated: (TIKA-543) Remove rome 1.0 dependency on java.net repository

2010-11-06 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-543: --- Remaining Estimate: 0h Original Estimate: 0h > Remove rome 1.0 dependency on java.net repository >

[jira] Commented: (TIKA-543) Remove rome 1.0 dependency on java.net repository

2010-11-06 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929284#action_12929284 ] Jukka Zitting commented on TIKA-543: I switched to Rome 0.8 in revision 1032184. It seems

[jira] Resolved: (TIKA-543) Remove rome 1.0 dependency on java.net repository

2010-11-06 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-543. Resolution: Fixed > Remove rome 1.0 dependency on java.net repository > -

[jira] Commented: (TIKA-543) Remove rome 1.0 dependency on java.net repository

2010-11-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929286#action_12929286 ] Chris A. Mattmann commented on TIKA-543: Hey Jukka, thanks. I was just going to try a

[jira] Updated: (TIKA-537) Command line option --list-parsers should list 2nd level parsers below CompositeParsers

2010-11-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-537: --- Fix Version/s: 0.8 - get this in real quick. > Command line option --list-parsers should list

[jira] Assigned: (TIKA-537) Command line option --list-parsers should list 2nd level parsers below CompositeParsers

2010-11-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned TIKA-537: -- Assignee: Chris A. Mattmann > Command line option --list-parsers should list 2nd level pa

[jira] Resolved: (TIKA-537) Command line option --list-parsers should list 2nd level parsers below CompositeParsers

2010-11-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-537. Resolution: Fixed - patch applied in r1032187. Thanks Jan, worked great! > Command line opti

Re: 0.8 release: latest status

2010-11-06 Thread Mattmann, Chris A (388J)
OK, I just rolled in TIKA-537. Looks like all issues are fixed, and we're ready for an 0.8 RC! I'll push one out this evening or tomorrow mid-day PST. Cheers, Chris On 11/2/10 6:50 PM, "Mattmann, Chris A (388J)" wrote: Hi Jay, Great. Well I know that Ken is making progress on the Boilerpip

Re: 0.8 release: latest status

2010-11-06 Thread Mattmann, Chris A (388J)
Hi Jan, Sorry I misspelled your name below. Apologies. In the rush to send a reply I transposed the character "n" in your name, with "y". I'm a stickler for stuff like that and I know how it makes me feel when someone does it to me. Sorry about that! Cheers, Chris On 11/2/10 6:50 PM, "Mattm