[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770553#comment-17770553
]
ASF GitHub Bot commented on NUTCH-2959:
---
tballison commented on PR #776:
URL:
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770552#comment-17770552
]
ASF GitHub Bot commented on NUTCH-2959:
---
sebastian-nagel commented on PR #776:
URL:
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770540#comment-17770540
]
ASF GitHub Bot commented on NUTCH-2959:
---
tballison commented on PR #776:
URL:
tballison commented on PR #776:
URL: https://github.com/apache/nutch/pull/776#issuecomment-1741263080
Alright, the only thing that I think _might_ work is Tika shading commons-io
in tika-app, and then Nutch uses tika-app instead of the individual
parser-modules etc. for parser-tika.
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770502#comment-17770502
]
ASF GitHub Bot commented on NUTCH-2959:
---
tballison commented on PR #776:
URL:
tballison commented on PR #776:
URL: https://github.com/apache/nutch/pull/776#issuecomment-1741143346
Stepping away from the keyboard. :sob:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770500#comment-17770500
]
ASF GitHub Bot commented on NUTCH-2959:
---
tballison commented on PR #776:
URL:
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770493#comment-17770493
]
ASF GitHub Bot commented on NUTCH-2959:
---
tballison commented on PR #776:
URL:
tballison commented on PR #776:
URL: https://github.com/apache/nutch/pull/776#issuecomment-1741125973
There is just no winning...
We just upgraded POI to 5.2.4, and it uses a bunch of the newer commons-io
methods. If we downgrade POI to 5.2.3, we get a clean build of Tika with
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770481#comment-17770481
]
ASF GitHub Bot commented on NUTCH-2959:
---
tballison commented on PR #776:
URL:
tballison commented on PR #776:
URL: https://github.com/apache/nutch/pull/776#issuecomment-1741061515
I reverted back to 2.2.1, and that's not far enough back -- there were 222
parse failures many with the wrap problem. I reverted back to 2.0.0, and then
had 85 parse failures again. This
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770477#comment-17770477
]
ASF GitHub Bot commented on NUTCH-2959:
---
tballison commented on PR #776:
URL:
tballison commented on PR #776:
URL: https://github.com/apache/nutch/pull/776#issuecomment-1741040471
>see the comments in
[test_tika_parser.sh](https://github.com/sebastian-nagel/nutch-test-single-node-cluster/blob/master/test_tika_parser.sh)
Sorry! Yep, saw that too late.
--
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770476#comment-17770476
]
ASF GitHub Bot commented on NUTCH-2959:
---
tballison commented on PR #776:
URL:
tballison commented on PR #776:
URL: https://github.com/apache/nutch/pull/776#issuecomment-1741039619
With the update to Tika 2.9.1-SNAPSHOT, I get 85 failed parses, most of them
are either encrypted documents or "can't retrieve Tika Parser for x"
> Migrating javax->jakarta has been quite a chore on Tika because of
> dependencies. Given back-compat issues with hadoop, is this even on the
> horizon for Nutch?
Good point. I think we are pretty free to replace javax packages in Nutch core
and plugins - they're used in multiple classes.
If
[
https://issues.apache.org/jira/browse/NUTCH-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770320#comment-17770320
]
Sebastian Nagel commented on NUTCH-3006:
> revert CloseShieldInputStream.wrap(), which I think
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770318#comment-17770318
]
ASF GitHub Bot commented on NUTCH-2959:
---
sebastian-nagel commented on PR #776:
URL:
sebastian-nagel commented on PR #776:
URL: https://github.com/apache/nutch/pull/776#issuecomment-1740397046
> what do I use for the tika seeds file? Are you using our github repo, or
the
> tika-parsers-common package specifically
see the comments in
19 matches
Mail list logo