[
https://issues.apache.org/jira/browse/TIKA-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144652#comment-13144652
]
Michael McCandless commented on TIKA-529:
-
This patch looks safe, and avoids crazy
I totally am. I've got some PHP skillz and Python skillz
that I would be willing to throw into the mix here.
Yes, I have some basic skillz on Python, and some advanced skillz on PHP,
so I can help you!
One other thing along these lines I've had in mind for a while:
how cool would it be to
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144772#comment-13144772
]
Joseph Vychtrle commented on TIKA-772:
--
Hey Jukka,
I found it happened only for html
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph Vychtrle updated TIKA-772:
-
Attachment: html.zip
media type detection fails for html documents, results in text/plain
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph Vychtrle updated TIKA-772:
-
Attachment: tika.png
I don't know then. Take a look at my results with tika v 0.10
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144828#comment-13144828
]
Joseph Vychtrle commented on TIKA-772:
--
MimeType detector doesn't find it, name of the
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144840#comment-13144840
]
Joseph Vychtrle commented on TIKA-772:
--
Got it, if I do
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144849#comment-13144849
]
Jukka Zitting commented on TIKA-772:
The latter method makes also the .html suffix
Hi Chris,
On 4 November 2011 15:42, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
Please vote on releasing this package as Apache Tika 1.0.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.
[X] +1 Release this
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144851#comment-13144851
]
Joseph Vychtrle commented on TIKA-772:
--
Weird,
{noformat}
java -jar tika-app-0.10.jar
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph Vychtrle updated TIKA-772:
-
Attachment: it.html
media type detection fails for html documents, results in text/plain
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144855#comment-13144855
]
Joseph Vychtrle commented on TIKA-772:
--
Attached... I'm on linux, using UTF-8 encoding
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144862#comment-13144862
]
Jukka Zitting commented on TIKA-772:
The metacharacters you mention do sound suspicious.
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144865#comment-13144865
]
Joseph Vychtrle commented on TIKA-772:
--
Funny thing Jukka, I will talk to Cedric Beust
14 matches
Mail list logo