[jira] [Created] (TIKA-3880) Tika not picking-up setByteArrayMaxOverride from tika-config

2022-10-14 Thread Ethan Wilansky (Jira)
Ethan Wilansky created TIKA-3880: Summary: Tika not picking-up setByteArrayMaxOverride from tika-config Key: TIKA-3880 URL: https://issues.apache.org/jira/browse/TIKA-3880 Project: Tika Issu

[jira] [Commented] (TIKA-3880) Tika not picking-up setByteArrayMaxOverride from tika-config

2022-10-14 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617878#comment-17617878 ] Ethan Wilansky commented on TIKA-3880: -- Hi Tim, Good catch! The parser was not wrapp

[jira] [Comment Edited] (TIKA-3880) Tika not picking-up setByteArrayMaxOverride from tika-config

2022-10-14 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617878#comment-17617878 ] Ethan Wilansky edited comment on TIKA-3880 at 10/14/22 4:57 PM:

[jira] [Commented] (TIKA-3880) Tika not picking-up setByteArrayMaxOverride from tika-config

2022-10-14 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617879#comment-17617879 ] Ethan Wilansky commented on TIKA-3880: -- I'll try the config you posted. > Tika not p

[jira] [Commented] (TIKA-3880) Tika not picking-up setByteArrayMaxOverride from tika-config

2022-10-14 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617902#comment-17617902 ] Ethan Wilansky commented on TIKA-3880: -- Thanks for the reference about large file pro

[jira] [Commented] (TIKA-3880) Tika not picking-up setByteArrayMaxOverride from tika-config

2022-10-14 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617934#comment-17617934 ] Ethan Wilansky commented on TIKA-3880: -- Hi Tim, I see I couldn't add a comment direc

[jira] [Resolved] (TIKA-3880) Tika not picking-up setByteArrayMaxOverride from tika-config

2022-10-17 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Wilansky resolved TIKA-3880. -- Fix Version/s: 2.5.0 Resolution: Resolved Confirmed that the setByteArrayMaxOverride sett

[jira] [Created] (TIKA-3890) Identifying an efficient approach for getting page count prior to running an extraction

2022-10-19 Thread Ethan Wilansky (Jira)
Ethan Wilansky created TIKA-3890: Summary: Identifying an efficient approach for getting page count prior to running an extraction Key: TIKA-3890 URL: https://issues.apache.org/jira/browse/TIKA-3890 P

[jira] [Updated] (TIKA-3890) Identifying an efficient approach for getting page count prior to running an extraction

2022-10-19 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Wilansky updated TIKA-3890: - Description: Tika is doing a great job with text extraction, until we encounter an Office documen

[jira] [Commented] (TIKA-3890) Identifying an efficient approach for getting page count prior to running an extraction

2022-10-19 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620630#comment-17620630 ] Ethan Wilansky commented on TIKA-3890: -- Aha, I'll have to give Apache POI a try. Than

[jira] [Commented] (TIKA-3890) Identifying an efficient approach for getting page count prior to running an extraction

2022-10-20 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621155#comment-17621155 ] Ethan Wilansky commented on TIKA-3890: -- Thanks Nick and Tim. This is really helpful.

[jira] [Commented] (TIKA-3890) Identifying an efficient approach for getting page count prior to running an extraction

2022-10-20 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621322#comment-17621322 ] Ethan Wilansky commented on TIKA-3890: -- Great information, thanks. I'll close this is

[jira] [Closed] (TIKA-3890) Identifying an efficient approach for getting page count prior to running an extraction

2022-10-20 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Wilansky closed TIKA-3890. Fix Version/s: 2.5.0 Resolution: Fixed > Identifying an efficient approach for getting page c

[jira] [Created] (TIKA-3894) Documentation update needed

2022-10-20 Thread Ethan Wilansky (Jira)
Ethan Wilansky created TIKA-3894: Summary: Documentation update needed Key: TIKA-3894 URL: https://issues.apache.org/jira/browse/TIKA-3894 Project: Tika Issue Type: Improvement Comp

[jira] [Closed] (TIKA-3880) Tika not picking-up setByteArrayMaxOverride from tika-config

2022-10-21 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Wilansky closed TIKA-3880. > Tika not picking-up setByteArrayMaxOverride from tika-config > ---

[jira] [Closed] (TIKA-3894) Documentation update needed

2022-10-21 Thread Ethan Wilansky (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Wilansky closed TIKA-3894. Thanks Tim! > Documentation update needed > --- > > Key: TIKA-3