[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678993#comment-17678993 ] ASF GitHub Bot commented on NUTCH-2980: --- KamilMroczek opened a new pull request, #753: URL:

[GitHub] [nutch] KamilMroczek opened a new pull request, #753: NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit

2023-01-19 Thread GitBox
KamilMroczek opened a new pull request, #753: URL: https://github.com/apache/nutch/pull/753 - Disabled phantomJS driver as it was causing problems casting TakeScreenshot to HtmlUnitWebDriver and the project has been archived since 2018 - Improved README setup instructions for IntelliJ

[jira] [Updated] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-19 Thread Kamil Mroczek (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Mroczek updated NUTCH-2980: - Description: Selenium version is quite old and had some issues processing a website. Once I

[jira] [Created] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-19 Thread Kamil Mroczek (Jira)
Kamil Mroczek created NUTCH-2980: Summary: Upgrade Selenium Java to 4.7.2 Key: NUTCH-2980 URL: https://issues.apache.org/jira/browse/NUTCH-2980 Project: Nutch Issue Type: Improvement

Re: Upgrading Selenium

2023-01-19 Thread Markus Jelsma
> This makes some sense if you do not know anything about the URL. > - a HEAD request could do almost the same > - often one knows whether there are only HTML pages or also PDFs, zip files, >and other stuff not suitable for Selenium. Could make the HEAD request >optional. Ah crap, i

Re: Upgrading Selenium

2023-01-19 Thread Sebastian Nagel
Hi Kamil, hi Markus, upgrading the Selenium plugin is very appreciated! > Besides that, the plugin also needs some overhaul. Definitely. > It currently first downloads the URL with HttpClient, and then, depending on > MIME-type, it may or may not forward the URL to Selenium so it can be >