[jira] [Resolved] (NUTCH-3005) Upgrade selenium as needed

2024-04-06 Thread Sebastian Nagel (Jira)
://github.com/apache/nutch/blob/1563396d952393462fffab1f686e9ffd5d006cf6/src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java#L151] . > Upgrade selenium as needed > -- > > Key: NUTCH-3005 >

[jira] [Updated] (NUTCH-3005) Upgrade selenium as needed

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3005: --- Affects Version/s: 1.19 > Upgrade selenium as nee

[jira] [Updated] (NUTCH-3005) Upgrade selenium as needed

2024-04-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-3005: --- Fix Version/s: 1.20 > Upgrade selenium as nee

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-30 Thread Hudson (Jira)
runk #155 (See [https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/155/]) NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… (#807) (github: [https://github.com/apache/nutch/commit/1563396d952393462fffab1f686e9ffd5d006cf6]) * (edit) src/plugin/lib-selenium/README.md * (

[jira] [Resolved] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-30 Thread Lewis John McGibbney (Jira)
in lib-selenium > > > Key: NUTCH-3036 > URL: https://issues.apache.org/jira/browse/NUTCH-3036 > Project: Nutch > Issue Type: Improvement > Comp

[jira] [Closed] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-30 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-3036. --- > Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selen

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-30 Thread ASF GitHub Bot (Jira)
ttps://github.com/apache/nutch/pull/807 > Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium > > > Key: NUTCH-3036 > URL: https://issues.apache.org/jir

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
UTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… URL: https://github.com/apache/nutch/pull/807 > Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium > > >

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium > > > Key: NUTCH-3036 > URL: https://issues.apache.org/jira/browse/NUTCH-3036 > Project: Nutch >

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
ttps://github.com/apache/nutch/pull/807#issuecomment-1998718730 There are some tangential proposed changes (such as improvements to logging) to this PR but they concern the relevant Class files. > Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-s

[jira] [Work stopped] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread Lewis John McGibbney (Jira)
lib-selenium > > > Key: NUTCH-3036 > URL: https://issues.apache.org/jira/browse/NUTCH-3036 > Project: Nutch > Issue Type: Improvement > Comp

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
o we need to rethink how to do this. For example, the [FirefoxDriver has a pretty elegant way of doing this](https://www.selenium.dev/selenium/docs/api/java/org/openqa/selenium/firefox/FirefoxDriver.html#getFullPageScreenshotAs(org.openqa.selenium.OutputType)) but it is different on other brow

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
ttps://github.com/apache/nutch/pull/807#issuecomment-1998711992 PR ready or review. Tested on * MacBook Pro * Apple M1 Pro * Sonora 14.4 * Firefox 115.X (compatible with current version of Selenium) > Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-s

[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)
Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium > > > Key: NUTCH-3036 > URL: https://issues.apache.org/jira/browse/NUTCH-3036 > Project: Nutch >

[jira] [Work started] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread Lewis John McGibbney (Jira)
lib-selenium > > > Key: NUTCH-3036 > URL: https://issues.apache.org/jira/browse/NUTCH-3036 > Project: Nutch > Issue Type: Improvement > Comp

[jira] [Created] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3036: --- Summary: Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium Key: NUTCH-3036 URL: https://issues.apache.org/jira/browse/NUTCH-3036

[jira] [Updated] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
webdriver in selenium than it does to render/fetch a couple of test pages I'm working with. On linux with a chrome driver, ~1.5 seconds to load the driver and then .5 of a second to fetch/render the page. On a mac, ~1.2 seconds to load and then .5 of a second to fetch/render. On a mac with fi

[jira] [Comment Edited] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
5 PM: -- On further reflection, what the above means is that if each of our threads creates its own web driver for every fetch, that means that the selenium instance is blocking the creation of these web-drivers until the current number of connections is less than the number of worker nodes T

[jira] [Commented] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
s is that if each of our threads creates its own web driver for every fetch, that means that the selenium instance is blocking the creation of these web-drivers until the current number of connections is < the number of worker nodes X SE_NODE_MAX_SESSIONS. In short, we're already rat

[jira] [Comment Edited] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
. :D > Consider pooling remote webdrivers for Selenium? > > > Key: NUTCH-3018 > URL: https://issues.apache.org/jira/browse/NUTCH-3018 > Project: Nutch > Issue Type: Task &

[jira] [Commented] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
vers than the {{SE_NODE_MAX_SESSIONS}} which defaults to 1. I think it would still be useful to reuse the webdriver(s) if we can. We could reconnect on exception, etc... This may be a horribly misguided approach. Let me know. :D > Consider pooling remote webdrivers for S

[jira] [Updated] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
webdriver in selenium than it does to render/fetch a couple of test pages I'm working with. On a mac with a chrome driver, ~1.5 seconds to load the driver and then .5 of a second to fetch/render the page. On a mac, ~1.2 seconds to load and then .5 of a second to fetch/render. On a mac with fi

[jira] [Created] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3018: -- Summary: Consider pooling remote webdrivers for Selenium? Key: NUTCH-3018 URL: https://issues.apache.org/jira/browse/NUTCH-3018 Project: Nutch Issue Type: Task

[jira] [Resolved] (NUTCH-2888) Selenium Protocol: Support for Selenium 4

2023-09-30 Thread Sebastian Nagel (Jira)
be included in the 1.20 release of Nutch. > Selenium Protocol: Support for Selenium 4 > - > > Key: NUTCH-2888 > URL: https://issues.apache.org/jira/browse/NUTCH-2888 > Project: Nutch >

[jira] [Updated] (NUTCH-2888) Selenium Protocol: Support for Selenium 4

2023-09-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2888: --- Affects Version/s: 1.18 > Selenium Protocol: Support for Seleniu

[jira] [Updated] (NUTCH-2888) Selenium Protocol: Support for Selenium 4

2023-09-30 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2888: --- Fix Version/s: 1.20 > Selenium Protocol: Support for Seleniu

[jira] [Created] (NUTCH-3005) Upgrade selenium as needed

2023-09-26 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3005: -- Summary: Upgrade selenium as needed Key: NUTCH-3005 URL: https://issues.apache.org/jira/browse/NUTCH-3005 Project: Nutch Issue Type: Improvement

[jira] [Resolved] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3000. Fix Version/s: 1.20 Resolution: Fixed > protocol-selenium returns only the body,strips

[jira] [Resolved] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3001. Fix Version/s: 1.20 Resolution: Fixed > protocol-selenium requires Content-Type hea

[jira] [Commented] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread Hudson (Jira)
runk #110 (See [https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/110/]) NUTCH-3000 - the selenium protocol should return the full html, not just the inner body element. (tallison: [https://github.com/apache/nutch/commit/820d129a8adff9a34eed2ed3c04cfee377b56b63]) * (edit) src/plugin/lib-sele

[jira] [Commented] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Hudson (Jira)
runk #110 (See [https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/110/]) NUTCH-3001 - fix logic for grabbing bytes if there's no content type in the header (tallison: [https://github.com/apache/nutch/commit/b6f645a4d025fa136f557dd37e9aba611b425fbb]) * (edit) src/plugin/protocol-selenium

[jira] [Commented] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread ASF GitHub Bot (Jira)
ttps://github.com/apache/nutch/pull/774 > protocol-selenium requires Content-Type header > --- > > Key: NUTCH-3001 > URL: https://issues.apache.org/jira/browse/NUTCH-3001 > Project: Nutch >

[jira] [Commented] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread ASF GitHub Bot (Jira)
ttps://github.com/apache/nutch/pull/773 > protocol-selenium returns only the body,strips off the element > -- > > Key: NUTCH-3000 > URL: https://issues.apache.org/jir

[GitHub] [nutch] tballison merged pull request #773: NUTCH-3000 - the selenium protocol should return the full html, not just the inner body

2023-09-13 Thread via GitHub
tballison merged PR #773: URL: https://github.com/apache/nutch/pull/773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

[jira] [Commented] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread ASF GitHub Bot (Jira)
cordingly? We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Nutch in general, please sign up for the [Nutch mailing list](https://nutch.apache.org/mailing_lists.html). Thanks! > proto

[jira] [Commented] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread ASF GitHub Bot (Jira)
). Thanks! > protocol-selenium returns only the body,strips off the element > -- > > Key: NUTCH-3000 > URL: https://issues.apache.org/jira/browse/NUTCH-3000 > Project: Nutch

[GitHub] [nutch] tballison opened a new pull request, #773: NUTCH-3000 - the selenium protocol should return the full html, not just the inner body

2023-09-13 Thread via GitHub
tballison opened a new pull request, #773: URL: https://github.com/apache/nutch/pull/773 …ust the inner body element. Thanks for your contribution to [Apache Nutch](https://nutch.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * th

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Description: It looks like the selenium protocol requires that there be a content-type header

[jira] [Commented] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread Markus Jelsma (Jira)
. +1 > protocol-selenium returns only the body,strips off the element > -- > > Key: NUTCH-3000 > URL: https://issues.apache.org/jira/browse/NUTCH-3000 > Project: Nutch >

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Priority: Minor (was: Major) > protocol-selenium requires Content-Type hea

[jira] [Commented] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764698#comment-17764698 ] Tim Allison commented on NUTCH-3001: Or is the notion that if the selenium prot

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Description: It looks like the selenium protocol requires that there be a content-type header

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Description: It looks like the selenium protocol requires that there be content-type. The logic

[jira] [Created] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3001: -- Summary: protocol-selenium requires Content-Type header Key: NUTCH-3001 URL: https://issues.apache.org/jira/browse/NUTCH-3001 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread Sebastian Nagel (Jira)
best choice for the default. > protocol-selenium returns only the body,strips off the element > -- > > Key: NUTCH-3000 > URL: https://issues.apache.org/jira/browse/NUTCH-3000 >

[jira] [Created] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3000: -- Summary: protocol-selenium returns only the body,strips off the element Key: NUTCH-3000 URL: https://issues.apache.org/jira/browse/NUTCH-3000 Project: Nutch

[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-02-18 Thread Hudson (Jira)
runk #94 (See [https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/94/]) NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit (snagel: [https://github.com/apache/nutch/commit/383aeca5d30342b29b6ee6e05f8f3052c62d7303]) * (edit) src/plugin/lib-htmlunit/ivy.xml * (edit) src/plugin/lib-selenium/plugin

[jira] [Resolved] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-02-18 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2980. Resolution: Implemented > Upgrade Selenium Java to 4.

[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-02-18 Thread ASF GitHub Bot (Jira)
snap packages are allowed to write to (they cannot write to the default `/tmp/`). Thanks, @KamilMroczek ! > Upgrade Selenium Java to 4.7.2 > -- > > Key: NUTCH-2980 > URL: https://issues.apache.org/jir

[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-02-18 Thread ASF GitHub Bot (Jira)
URL: https://github.com/apache/nutch/pull/753 > Upgrade Selenium Java to 4.7.2 > -- > > Key: NUTCH-2980 > URL: https://issues.apache.org/jira/browse/NUTCH-2980 > Project: Nutch > Issue Type: Improvem

[GitHub] [nutch] sebastian-nagel merged pull request #753: NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit

2023-02-18 Thread via GitHub
sebastian-nagel merged PR #753: URL: https://github.com/apache/nutch/pull/753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@nutch.apac

[GitHub] [nutch] sebastian-nagel commented on pull request #753: NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit

2023-02-18 Thread via GitHub
sebastian-nagel commented on PR #753: URL: https://github.com/apache/nutch/pull/753#issuecomment-1435694153 Finally, I was able to successfully test it - the reason was that on recent Ubuntu systems Firefox and Chromium are installed as snap packages. This adds extra sandboxing and requires

[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-21 Thread ASF GitHub Bot (Jira)
URL: https://github.com/apache/nutch/pull/753#issuecomment-1399257710 Ok. I was able to run in local mode on mac with firefox & chrome. And also on AWS Linux with Chrome. > Upgrade Selenium Java to 4.7.2 > -- > > Key: NUTCH-2980 &

[GitHub] [nutch] KamilMroczek commented on pull request #753: NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit

2023-01-21 Thread via GitHub
KamilMroczek commented on PR #753: URL: https://github.com/apache/nutch/pull/753#issuecomment-1399257710 Ok. I was able to run in local mode on mac with firefox & chrome. And also on AWS Linux with Chrome. -- This is an automated message from the Apache Git Service. To respond to the mess

[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-21 Thread ASF GitHub Bot (Jira)
URL: https://github.com/apache/nutch/pull/753#issuecomment-1399245784 Hi @KamilMroczek, indeed - keeping the licenses up-to-date is a difficult task. See also NUTCH-2290 and NUTCH-2981. Unfortunately, so far I wasn't able to successfully test protocol-selenium and this PR. But t

[GitHub] [nutch] sebastian-nagel commented on pull request #753: NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit

2023-01-21 Thread via GitHub
test protocol-selenium and this PR. But this is on my side. Both Chrome and Firefox (recent browser and driver versions) show up (in headful mode) but for some reason the driver than times out with obscure error messages. It's reproducible from Python, so it's something with my system (

[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-20 Thread ASF GitHub Bot (Jira)
URL: https://github.com/apache/nutch/pull/753#issuecomment-1398516536 Yeah the verifying of the licenses was a bit of a pain. I found some tools to help with finding licenses for a batch of libraries but none of them (which I could find) supported our format. > Upgrade Selenium Java t

[GitHub] [nutch] KamilMroczek commented on pull request #753: NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit

2023-01-20 Thread GitBox
KamilMroczek commented on PR #753: URL: https://github.com/apache/nutch/pull/753#issuecomment-1398516536 Yeah the verifying of the licenses was a bit of a pain. I found some tools to help with finding licenses for a batch of libraries but none of them (which I could find) supported our form

Re: Upgrading Selenium

2023-01-20 Thread Markus Jelsma
> There must be a way, some how, some time. There isn't: https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/141 Op do 19 jan. 2023 om 15:23 schreef Markus Jelsma < markus.jel...@openindex.io>: > > This makes some sense if you do not know anything about t

[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-19 Thread ASF GitHub Bot (Jira)
d as part of the selenium-java and htmlunit upgrades. They are all Apache 2.0, MIT or EDL. async-http-client async-http-client-netty-utils auto-common auto-service auto-service-annotations checker-qual dec failsafe failureaccess htmlunit-xpath jakarta.activa

[GitHub] [nutch] KamilMroczek opened a new pull request, #753: NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit

2023-01-19 Thread GitBox
The following libraries were added as part of the selenium-java and htmlunit upgrades. They are all Apache 2.0, MIT or EDL. async-http-client async-http-client-netty-utils auto-common auto-service auto-service-annotations checker-qual dec failsafe failureaccess

[jira] [Updated] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-19 Thread Kamil Mroczek (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Mroczek updated NUTCH-2980: - Description: Selenium version is quite old and had some issues processing a website. Once I

[jira] [Created] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-19 Thread Kamil Mroczek (Jira)
Kamil Mroczek created NUTCH-2980: Summary: Upgrade Selenium Java to 4.7.2 Key: NUTCH-2980 URL: https://issues.apache.org/jira/browse/NUTCH-2980 Project: Nutch Issue Type: Improvement

Re: Upgrading Selenium

2023-01-19 Thread Markus Jelsma
> This makes some sense if you do not know anything about the URL. > - a HEAD request could do almost the same > - often one knows whether there are only HTML pages or also PDFs, zip files, >and other stuff not suitable for Selenium. Could make the HEAD request >option

Re: Upgrading Selenium

2023-01-19 Thread Sebastian Nagel
Hi Kamil, hi Markus, upgrading the Selenium plugin is very appreciated! > Besides that, the plugin also needs some overhaul. Definitely. > It currently first downloads the URL with HttpClient, and then, depending on > MIME-type, it may or may not forward the URL to Selenium so

Re: Upgrading Selenium

2023-01-18 Thread Kamil Mroczek
y not forward the URL to Selenium so it can be downloaded again. > > There is a lot of code in the plugin that should be removed. I would also > opt for merging the lib-selenium plugin with the protocol-selenium plugin. > There is no obvious need for having it separated. > &g

Re: Upgrading Selenium

2023-01-17 Thread Markus Jelsma
may or may not forward the URL to Selenium so it can be downloaded again. There is a lot of code in the plugin that should be removed. I would also opt for merging the lib-selenium plugin with the protocol-selenium plugin. There is no obvious need for having it separated. These can be, of course

Upgrading Selenium

2023-01-17 Thread Kamil Mroczek
Hello, I am sending a message to inquire whether I should submit a patch which updates selenium to the latest version. Although it is a major version upgrade to the library, very few code changes were needed to update. For a preview of the changes I made you can look here <https://github.

[jira] [Created] (NUTCH-2907) protocol-selenium: HTTPS proxy not working

2021-11-18 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2907: -- Summary: protocol-selenium: HTTPS proxy not working Key: NUTCH-2907 URL: https://issues.apache.org/jira/browse/NUTCH-2907 Project: Nutch Issue Type

[jira] [Created] (NUTCH-2888) Selenium Protocol: Support for Selenium 4

2021-08-16 Thread Mikko Kivistoe (Jira)
Mikko Kivistoe created NUTCH-2888: - Summary: Selenium Protocol: Support for Selenium 4 Key: NUTCH-2888 URL: https://issues.apache.org/jira/browse/NUTCH-2888 Project: Nutch Issue Type: New

[GitHub] [nutch] lewismc closed pull request #632: fireant upgrade dependency selenium-java in src/plugin/lib-selenium/ivy.xml to 3.141.59

2021-04-30 Thread GitBox
lewismc closed pull request #632: URL: https://github.com/apache/nutch/pull/632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please con

[GitHub] [nutch] fireant-ci opened a new pull request #632: fireant upgrade dependency selenium-java in src/plugin/lib-selenium/ivy.xml to 3.141.59

2021-04-30 Thread GitBox
fireant-ci opened a new pull request #632: URL: https://github.com/apache/nutch/pull/632 fireant upgrade dependency selenium-java in src/plugin/lib-selenium/ivy.xml to 3.141.59 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[jira] [Updated] (NUTCH-2825) lib-selenium: property webdriver.chrome.driver overwritten by selenium.grid.binary

2020-09-16 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2825: --- Description: (see [https://stackoverflow.com/questions/63456514/nutch-selenium-interactive

[jira] [Updated] (NUTCH-2825) lib-selenium: property webdriver.chrome.driver overwritten by selenium.grid.binary

2020-08-21 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2825: --- Summary: lib-selenium: property webdriver.chrome.driver overwritten by selenium.grid.binary

[jira] [Created] (NUTCH-2825) lib-selenium: property

2020-08-21 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2825: -- Summary: lib-selenium: property Key: NUTCH-2825 URL: https://issues.apache.org/jira/browse/NUTCH-2825 Project: Nutch Issue Type: Bug

[jira] [Comment Edited] (NUTCH-2681) ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0

2020-04-23 Thread Sebastian Nagel (Jira)
1 AM: --- Well, Nutch now uses Selenium 3.141.5 (after NUTCH-2676) and Firefox is on version 75. Closing Thanks, [~venkata...@hcl.com]! was (Author: wastl-nagel): Well, Nutch now uses Selenium 3.141.5 (after NUTCH-2716) and Firefox is on version 75. Closing Thanks, [~venkata...@hcl

[jira] [Updated] (NUTCH-2681) ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2681: --- Fix Version/s: (was: 1.17) > ClassCastException - Apache Nutch 1.x, Selenium v2.4

[jira] [Resolved] (NUTCH-2681) ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0

2020-04-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2681. Resolution: Abandoned Well, Nutch now uses Selenium 3.141.5 (after NUTCH-2716) and Firefox

[jira] [Commented] (NUTCH-2649) Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit

2020-01-23 Thread Hudson (Jira)
3663 (See [https://builds.apache.org/job/Nutch-trunk/3663/]) Fix for NUTCH-2649: Optionally skip TLS/SSL certificate validation for (shbalakuntala: [https://github.com/apache/nutch/commit/aa72b7506c74e7f95792732809449a14a3adced7]) * (edit) src/plugin/protocol-selenium/src/java/org/apache/nutch/prot

[jira] [Resolved] (NUTCH-2649) Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit

2020-01-23 Thread Sebastian Nagel (Jira)
unify the multiple copies of the DummyX509TrustManager. > Optionally skip TLS/SSL certificate validation for protocol-selenium and > protocol-htmlunit > -- > > Key: NUTCH-2649 &

[jira] [Created] (NUTCH-2766) Update selenium-based protocol plugins to be in sync with protocol-http

2020-01-23 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2766: -- Summary: Update selenium-based protocol plugins to be in sync with protocol-http Key: NUTCH-2766 URL: https://issues.apache.org/jira/browse/NUTCH-2766 Project

[jira] [Commented] (NUTCH-2649) Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit

2020-01-23 Thread ASF GitHub Bot (Jira)
g on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Optionally skip TLS/SSL certificate validation for protocol-selenium and > protoco

[jira] [Commented] (NUTCH-2649) Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit

2020-01-21 Thread ASF GitHub Bot (Jira)
#496: Fix for NUTCH-2649: Optionally skip TLS/SSL certificate validation fo… URL: https://github.com/apache/nutch/pull/496#issuecomment-576678970 Thanks, @balashashanka! Code compiles now. I've tested protocol-selenium on https://expired.badssl.com/: ``` $> bin/nutch

[jira] [Commented] (NUTCH-2649) Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit

2020-01-19 Thread ASF GitHub Bot (Jira)
vice, please contact Infrastructure at: us...@infra.apache.org > Optionally skip TLS/SSL certificate validation for protocol-selenium and > protocol-htmlunit > -- > >

[jira] [Commented] (NUTCH-2649) Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit

2020-01-19 Thread ASF GitHub Bot (Jira)
e to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Optionally skip TLS/SSL certificate validation for protocol-selenium and > protoco

[jira] [Commented] (NUTCH-2649) Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit

2020-01-17 Thread ASF GitHub Bot (Jira)
uest #496: Fix for NUTCH-2649: Optionally skip TLS/SSL certificate validation fo… URL: https://github.com/apache/nutch/pull/496 Added the functionality to: protocol-htmlunit, protocol-interactiveselenium and protocol-sele

[jira] [Assigned] (NUTCH-2649) Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit

2019-12-21 Thread Shashanka Balakuntala Srinivasa (Jira)
> Optionally skip TLS/SSL certificate validation for protocol-selenium and > protocol-htmlunit > -- > > Key: NUTCH-2649 > URL: https://issues.apache.org/jira

[jira] [Resolved] (NUTCH-2024) httpcore classpath jar conflict when invoking protocol-selenium

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2024. Resolution: Cannot Reproduce Hi [~lewismc], closing this old issue for now. The selenium

[jira] [Updated] (NUTCH-2118) browser requests sometimes timeout when using the selenium grid because of port access issues

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2118: --- Affects Version/s: 1.15 > browser requests sometimes timeout when using the selenium g

[jira] [Resolved] (NUTCH-2126) Use selenium protocol for specific sites

2019-11-22 Thread Sebastian Nagel (Jira)
NUTCH-2678. > Use selenium protocol for specific sites > > > Key: NUTCH-2126 > URL: https://issues.apache.org/jira/browse/NUTCH-2126 > Project: Nutch > Issue Type: Sub-task >

[jira] [Resolved] (NUTCH-2131) Problem running nutch(crawl) with selenium

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2131. Resolution: Won't Do The selenium plugins have been upgraded in NUTCH-2676. Please

[jira] [Resolved] (NUTCH-2240) ava.lang.NoSuchFieldError: INSTANCE selenium nutch

2019-11-22 Thread Sebastian Nagel (Jira)
can be hardly reproduced. In addtion, the selenium plugins have been upgraded in NUTCH-2676. Please test with the recent Nutch 1.16 and reopen if the problem persists. Thanks! > ava.lang.NoSuchFieldError: INSTANCE selenium nu

[jira] [Commented] (NUTCH-2681) ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0

2019-11-22 Thread Sebastian Nagel (Jira)
ssant" by NUTCH-2676. Needs verification. > ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0 > --- > > Key: NUTCH-2681 > URL: https://issues.apac

[jira] [Updated] (NUTCH-2681) ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2681: --- Fix Version/s: 1.17 > ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, fire

[jira] [Updated] (NUTCH-2133) Transfer Selenium Documentation to Wiki

2019-10-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2133: --- Summary: Transfer Selenium Documentation to Wiki (was: Transfer Selenium Documentation to

[jira] [Updated] (NUTCH-2133) Transfer Selenium Documentation to WIki

2019-10-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2133: --- Fix Version/s: (was: 2.5) > Transfer Selenium Documentation to W

[jira] [Reopened] (NUTCH-2133) Transfer Selenium Documentation to WIki

2019-10-14 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-2133: > Transfer Selenium Documentation to W

[jira] [Updated] (NUTCH-2133) Transfer Selenium Documentation to WIki

2019-09-26 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2133: --- Fix Version/s: (was: 1.16) 1.17 > Transfer Selenium Documentation

[jira] [Updated] (NUTCH-2721) Make the plugin lib-htmlunit depend on lib-selenium

2019-09-23 Thread Sebastian Nagel (Jira)
end on lib-selenium > --- > > Key: NUTCH-2721 > URL: https://issues.apache.org/jira/browse/NUTCH-2721 > Project: Nutch > Issue Type: Improvement > Components: bu

[jira] [Updated] (NUTCH-2649) Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit

2019-09-23 Thread Sebastian Nagel (Jira)
ate validation for protocol-selenium and > protocol-htmlunit > -- > > Key: NUTCH-2649 > URL: https://issues.apache.org/jira/browse/NUTCH-2649 >

  1   2   3   4   5   6   >