[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144763#comment-13144763
]
Jukka Zitting commented on TIKA-772:
Can you attach an example document that illustrates
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144772#comment-13144772
]
Joseph Vychtrle commented on TIKA-772:
--
Hey Jukka,
I found it happened only for html
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144828#comment-13144828
]
Joseph Vychtrle commented on TIKA-772:
--
MimeType detector doesn't find it, name of the
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144836#comment-13144836
]
Jukka Zitting commented on TIKA-772:
I piped the files to tika-app to prevent it from se
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144840#comment-13144840
]
Joseph Vychtrle commented on TIKA-772:
--
Got it, if I do
{code}tika.detect(TikaInputStr
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144849#comment-13144849
]
Jukka Zitting commented on TIKA-772:
The latter method makes also the .html suffix avail
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144851#comment-13144851
]
Joseph Vychtrle commented on TIKA-772:
--
Weird,
{noformat}
java -jar tika-app-0.10.jar -
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144853#comment-13144853
]
Joseph Vychtrle commented on TIKA-772:
--
But to be honest, it makes sense. Tika doesn't
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144854#comment-13144854
]
Jukka Zitting commented on TIKA-772:
The test case you added prints out "text/html" for
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144855#comment-13144855
]
Joseph Vychtrle commented on TIKA-772:
--
Attached... I'm on linux, using UTF-8 encoding
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144862#comment-13144862
]
Jukka Zitting commented on TIKA-772:
The metacharacters you mention do sound suspicious.
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144865#comment-13144865
]
Joseph Vychtrle commented on TIKA-772:
--
Funny thing Jukka, I will talk to Cedric Beust
12 matches
Mail list logo