Ethan Wilansky created TIKA-3880:
Summary: Tika not picking-up setByteArrayMaxOverride from
tika-config
Key: TIKA-3880
URL: https://issues.apache.org/jira/browse/TIKA-3880
Project: Tika
Issu
[
https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617878#comment-17617878
]
Ethan Wilansky commented on TIKA-3880:
--
Hi Tim,
Good catch! The parser was not wrapp
[
https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617878#comment-17617878
]
Ethan Wilansky edited comment on TIKA-3880 at 10/14/22 4:57 PM:
[
https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617879#comment-17617879
]
Ethan Wilansky commented on TIKA-3880:
--
I'll try the config you posted.
> Tika not p
[
https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617902#comment-17617902
]
Ethan Wilansky commented on TIKA-3880:
--
Thanks for the reference about large file pro
[
https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617934#comment-17617934
]
Ethan Wilansky commented on TIKA-3880:
--
Hi Tim,
I see I couldn't add a comment direc
[
https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Wilansky resolved TIKA-3880.
--
Fix Version/s: 2.5.0
Resolution: Resolved
Confirmed that the setByteArrayMaxOverride sett
Ethan Wilansky created TIKA-3890:
Summary: Identifying an efficient approach for getting page count
prior to running an extraction
Key: TIKA-3890
URL: https://issues.apache.org/jira/browse/TIKA-3890
P
[
https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Wilansky updated TIKA-3890:
-
Description:
Tika is doing a great job with text extraction, until we encounter an Office
documen
[
https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620630#comment-17620630
]
Ethan Wilansky commented on TIKA-3890:
--
Aha, I'll have to give Apache POI a try. Than
[
https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621155#comment-17621155
]
Ethan Wilansky commented on TIKA-3890:
--
Thanks Nick and Tim. This is really helpful.
[
https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621322#comment-17621322
]
Ethan Wilansky commented on TIKA-3890:
--
Great information, thanks. I'll close this is
[
https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Wilansky closed TIKA-3890.
Fix Version/s: 2.5.0
Resolution: Fixed
> Identifying an efficient approach for getting page c
Ethan Wilansky created TIKA-3894:
Summary: Documentation update needed
Key: TIKA-3894
URL: https://issues.apache.org/jira/browse/TIKA-3894
Project: Tika
Issue Type: Improvement
Comp
[
https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Wilansky closed TIKA-3880.
> Tika not picking-up setByteArrayMaxOverride from tika-config
> ---
[
https://issues.apache.org/jira/browse/TIKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Wilansky closed TIKA-3894.
Thanks Tim!
> Documentation update needed
> ---
>
> Key: TIKA-3
16 matches
Mail list logo