[
https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-3000.
Fix Version/s: 1.20
Resolution: Fixed
> protocol-selenium returns only the body,strips off
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-3001.
Fix Version/s: 1.20
Resolution: Fixed
> protocol-selenium requires Content-Type header
>
[
https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764803#comment-17764803
]
Hudson commented on NUTCH-3000:
---
SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #110 (See
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764802#comment-17764802
]
Hudson commented on NUTCH-3001:
---
SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #110 (See
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764792#comment-17764792
]
ASF GitHub Bot commented on NUTCH-3001:
---
tballison merged PR #774:
URL:
[
https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764791#comment-17764791
]
ASF GitHub Bot commented on NUTCH-3000:
---
tballison merged PR #773:
URL:
tballison merged PR #774:
URL: https://github.com/apache/nutch/pull/774
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
tballison merged PR #773:
URL: https://github.com/apache/nutch/pull/773
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
+1
Since any23 also depends on tika-core, the plugin is likely to break if we
upgrade to a more recent Tika version in Nutch core and the parse-tika plugin.
~Sebastian
On 9/13/23 16:50, Tim Allison wrote:
All,
I opened https://issues.apache.org/jira/browse/NUTCH-2998
[
https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764741#comment-17764741
]
Tim Allison commented on NUTCH-2998:
Sorry, I botched the title in the PR:
tballison commented on PR #775:
URL: https://github.com/apache/nutch/pull/775#issuecomment-1717820655
When I build this, I get this harmless (?) warning in
`src/plugin/logs/hadoop.log`:
```
2023-02-24 10:07:39,218 WARN o.a.n.p.PluginManifestParser [main] Error while
loading
tballison opened a new pull request, #775:
URL: https://github.com/apache/nutch/pull/775
Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)!
Your help is appreciated!
Before opening the pull request, please verify that
* there is an open issue on the [Nutch
All,
I opened https://issues.apache.org/jira/browse/NUTCH-2998 a few weeks
ago. Any23 was moved to the attic in June. Unless there are objections, I
propose removing it from Nutch before the next release.
Any objections?
Best,
Tim
[
https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764722#comment-17764722
]
ASF GitHub Bot commented on NUTCH-2978:
---
tballison commented on PR #772:
URL:
tballison commented on PR #772:
URL: https://github.com/apache/nutch/pull/772#issuecomment-1717765669
If folks could test this out on their workloads, that'd be fantastic! It
works on mine, but I'm really hesitant to merge until someone else runs it.
Thank you!
--
This is an automated
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764720#comment-17764720
]
ASF GitHub Bot commented on NUTCH-3001:
---
tballison opened a new pull request, #774:
URL:
tballison opened a new pull request, #774:
URL: https://github.com/apache/nutch/pull/774
…in the header
Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)!
Your help is appreciated!
Before opening the pull request, please verify that
* there is an
[
https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764718#comment-17764718
]
ASF GitHub Bot commented on NUTCH-3000:
---
tballison opened a new pull request, #773:
URL:
tballison opened a new pull request, #773:
URL: https://github.com/apache/nutch/pull/773
…ust the inner body element.
Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)!
Your help is appreciated!
Before opening the pull request, please verify that
*
[
https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764705#comment-17764705
]
Tim Allison commented on NUTCH-2978:
I haven't tested in hadoop. I've just run it locally, and, for
[
https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764699#comment-17764699
]
Markus Jelsma commented on NUTCH-2978:
--
You managed to get it up and running, as well when deployed
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3001:
---
Description:
It looks like the selenium protocol requires that there be a content-type
header.
[
https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764697#comment-17764697
]
Markus Jelsma commented on NUTCH-3000:
--
Yes, this is a bit odd indeed. +1
> protocol-selenium
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3001:
---
Priority: Minor (was: Major)
> protocol-selenium requires Content-Type header
>
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764698#comment-17764698
]
Tim Allison commented on NUTCH-3001:
Or is the notion that if the selenium protocol doesn't pull any
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3001:
---
Description:
It looks like the selenium protocol requires that there be a content-type
header.
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3001:
---
Description:
It looks like the selenium protocol requires that there be content-type.
The logic
Tim Allison created NUTCH-3001:
--
Summary: protocol-selenium requires Content-Type header
Key: NUTCH-3001
URL: https://issues.apache.org/jira/browse/NUTCH-3001
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764692#comment-17764692
]
Sebastian Nagel commented on NUTCH-3000:
+1 Yes, the full HTML seems the best choice for the
Tim Allison created NUTCH-3000:
--
Summary: protocol-selenium returns only the body,strips off the
element
Key: NUTCH-3000
URL: https://issues.apache.org/jira/browse/NUTCH-3000
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764672#comment-17764672
]
Sebastian Nagel edited comment on NUTCH-2998 at 9/13/23 1:26 PM:
-
+1
>
[
https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764672#comment-17764672
]
Sebastian Nagel commented on NUTCH-2998:
+1
> Remove the Any23 plugin
> ---
32 matches
Mail list logo