[
https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678993#comment-17678993
]
ASF GitHub Bot commented on NUTCH-2980:
---
KamilMroczek opened a new pull request, #753:
URL:
KamilMroczek opened a new pull request, #753:
URL: https://github.com/apache/nutch/pull/753
- Disabled phantomJS driver as it was causing problems casting
TakeScreenshot to HtmlUnitWebDriver and the project has been archived since 2018
- Improved README setup instructions for IntelliJ
[
https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kamil Mroczek updated NUTCH-2980:
-
Description:
Selenium version is quite old and had some issues processing a website. Once I
Kamil Mroczek created NUTCH-2980:
Summary: Upgrade Selenium Java to 4.7.2
Key: NUTCH-2980
URL: https://issues.apache.org/jira/browse/NUTCH-2980
Project: Nutch
Issue Type: Improvement
> This makes some sense if you do not know anything about the URL.
> - a HEAD request could do almost the same
> - often one knows whether there are only HTML pages or also PDFs, zip
files,
>and other stuff not suitable for Selenium. Could make the HEAD request
>optional.
Ah crap, i
Hi Kamil, hi Markus,
upgrading the Selenium plugin is very appreciated!
> Besides that, the plugin also needs some overhaul.
Definitely.
> It currently first downloads the URL with HttpClient, and then, depending on
> MIME-type, it may or may not forward the URL to Selenium so it can be
>
6 matches
Mail list logo