[jira] [Commented] (TIKA-529) IBM420 charset detection's isLamAlef is allocation-happy

2011-11-05 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144652#comment-13144652 ] Michael McCandless commented on TIKA-529: - This patch looks safe, and avoids crazy

Re: Multilingual Tika

2011-11-05 Thread Jérôme Charron
I totally am. I've got some PHP skillz and Python skillz that I would be willing to throw into the mix here. Yes, I have some basic skillz on Python, and some advanced skillz on PHP, so I can help you! One other thing along these lines I've had in mind for a while: how cool would it be to

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144772#comment-13144772 ] Joseph Vychtrle commented on TIKA-772: -- Hey Jukka, I found it happened only for html

[jira] [Updated] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Vychtrle updated TIKA-772: - Attachment: html.zip media type detection fails for html documents, results in text/plain

[jira] [Updated] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Vychtrle updated TIKA-772: - Attachment: tika.png I don't know then. Take a look at my results with tika v 0.10

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144828#comment-13144828 ] Joseph Vychtrle commented on TIKA-772: -- MimeType detector doesn't find it, name of the

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144840#comment-13144840 ] Joseph Vychtrle commented on TIKA-772: -- Got it, if I do

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144849#comment-13144849 ] Jukka Zitting commented on TIKA-772: The latter method makes also the .html suffix

Re: [VOTE] Apache Tika 1.0 release rc #1

2011-11-05 Thread Dave Meikle
Hi Chris, On 4 November 2011 15:42, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Please vote on releasing this package as Apache Tika 1.0. The vote is open for the next 72 hours and passes if a majority of at least three +1 Tika PMC votes are cast. [X] +1 Release this

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144851#comment-13144851 ] Joseph Vychtrle commented on TIKA-772: -- Weird, {noformat} java -jar tika-app-0.10.jar

[jira] [Updated] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Vychtrle updated TIKA-772: - Attachment: it.html media type detection fails for html documents, results in text/plain

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144855#comment-13144855 ] Joseph Vychtrle commented on TIKA-772: -- Attached... I'm on linux, using UTF-8 encoding

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144862#comment-13144862 ] Jukka Zitting commented on TIKA-772: The metacharacters you mention do sound suspicious.

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144865#comment-13144865 ] Joseph Vychtrle commented on TIKA-772: -- Funny thing Jukka, I will talk to Cedric Beust