[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545383#comment-17545383
]
Tim Allison commented on TIKA-3710:
---
Thank you, [~lfcnassif]!
> HTML document detected
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545156#comment-17545156
]
Luís Filipe Nassif commented on TIKA-3710:
--
Seems good to me [~tallison] !
> HTM
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545044#comment-17545044
]
Hudson commented on TIKA-3710:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #6
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544964#comment-17544964
]
Tim Allison commented on TIKA-3710:
---
I just committed and pushed this. Please let me kn
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539607#comment-17539607
]
Tim Allison commented on TIKA-3710:
---
The current main block is 40, which is intentionall
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539594#comment-17539594
]
Nick Burch commented on TIKA-3710:
--
As a "normal" html file wouldn't start with these sni
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539590#comment-17539590
]
Tim Allison commented on TIKA-3710:
---
Sounds good. What do you think of breaking those o
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539582#comment-17539582
]
Nick Burch commented on TIKA-3710:
--
I was thinking we'd do (open)h1(close) or (open)h1(sp
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539580#comment-17539580
]
Tim Allison commented on TIKA-3710:
---
This works on the test file:
{noformat}
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539574#comment-17539574
]
Tim Allison commented on TIKA-3710:
---
Sorry, that comment must have referred to the patte
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539054#comment-17539054
]
Sam Stephens commented on TIKA-3710:
{quote}The h1 isn't quite as unique as we might l
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539051#comment-17539051
]
Sam Stephens commented on TIKA-3710:
Is it valid for a message/rfc822 message to have
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538974#comment-17538974
]
Tim Allison commented on TIKA-3710:
---
The hiccup is this point in the mimetypes.xml file.
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538963#comment-17538963
]
Tim Allison commented on TIKA-3710:
---
Thank you, [~nick]. I was being imprecise on {{h1}}
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538896#comment-17538896
]
Nick Burch commented on TIKA-3710:
--
The h1 isn't quite as unique as we might like, and ma
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538885#comment-17538885
]
Tim Allison commented on TIKA-3710:
---
As I look at our mime type for html, we do include
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538524#comment-17538524
]
Sam Stephens commented on TIKA-3710:
Note that I exclude org.apache.tika.parser.mail.R
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516142#comment-17516142
]
Sam Stephens commented on TIKA-3710:
The HTML document is exactly what you see there;
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515921#comment-17515921
]
Tim Allison commented on TIKA-3710:
---
Did the original html file actually have an html he
19 matches
Mail list logo