[
https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130042#comment-17130042
]
suchendra commented on TIKA-3097:
-
Yeah, that's true. If I use SAX DOCX the memory footpri
[
https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129624#comment-17129624
]
Tim Allison commented on TIKA-3097:
---
Java will take as much heap as it can use. If this
[
https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129582#comment-17129582
]
suchendra commented on TIKA-3097:
-
Even for the attached "txt", heap is hitting almost 700
[
https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129574#comment-17129574
]
Tim Allison commented on TIKA-3097:
---
I'm not sure I understand the question. We don't h
[
https://issues.apache.org/jira/browse/TIKA-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129553#comment-17129553
]
suchendra commented on TIKA-3097:
-
[~tallison], Is there any streaming solution for all th
I like the regex option, and I _think_ that the anchor at the beginning
(along with the lack of backtracking) shouldn't cause horrible performance
degradation.
On Tue, Jun 9, 2020 at 7:04 AM Nick Burch wrote:
> Hi All
>
> At the moment, to detect RFC822 emails, we try and check for a bunch of
>
Hi All
At the moment, to detect RFC822 emails, we try and check for a bunch of
common header lines right at the start. If not, we check for a few "could
be an unusual header, could be some text", followed by checking for common
headers in a larger area of text below.
For example, starts with