[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144763#comment-13144763 ] Jukka Zitting commented on TIKA-772: Can you attach an example document that illustrates

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144772#comment-13144772 ] Joseph Vychtrle commented on TIKA-772: -- Hey Jukka, I found it happened only for html

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144828#comment-13144828 ] Joseph Vychtrle commented on TIKA-772: -- MimeType detector doesn't find it, name of the

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144836#comment-13144836 ] Jukka Zitting commented on TIKA-772: I piped the files to tika-app to prevent it from se

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144840#comment-13144840 ] Joseph Vychtrle commented on TIKA-772: -- Got it, if I do {code}tika.detect(TikaInputStr

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144849#comment-13144849 ] Jukka Zitting commented on TIKA-772: The latter method makes also the .html suffix avail

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144851#comment-13144851 ] Joseph Vychtrle commented on TIKA-772: -- Weird, {noformat} java -jar tika-app-0.10.jar -

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144853#comment-13144853 ] Joseph Vychtrle commented on TIKA-772: -- But to be honest, it makes sense. Tika doesn't

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144854#comment-13144854 ] Jukka Zitting commented on TIKA-772: The test case you added prints out "text/html" for

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144855#comment-13144855 ] Joseph Vychtrle commented on TIKA-772: -- Attached... I'm on linux, using UTF-8 encoding

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Jukka Zitting (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144862#comment-13144862 ] Jukka Zitting commented on TIKA-772: The metacharacters you mention do sound suspicious.

[jira] [Commented] (TIKA-772) media type detection fails for html documents, results in text/plain instead of text/html

2011-11-05 Thread Joseph Vychtrle (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144865#comment-13144865 ] Joseph Vychtrle commented on TIKA-772: -- Funny thing Jukka, I will talk to Cedric Beust