Hi Peter,

the best description for the Selenium plugin is the README.md [1].

Otherwise, could you share which Selenium driver is used?

Thanks,
Sebastian

[1] https://github.com/apache/nutch/blob/master/src/plugin/protocol-selenium/README.md

On 12/17/24 21:07, Peter Viskup wrote:
Just not able to get it working...
At first I got selenium timeout exception even
with libselenium.page.load.delay set. The solution was to increase the
value of page.load.delay which was default of 3.

Then I stucked with the output of Selenium which shows "You need to enable
JavaScript".

Am running the nutch with command:
./bin/nutch parsechecker -Dplugin.includes='protocol-selenium|parse-tika' \
  -Dselenium.enable.headless=true \
  -Dlibselenium.page.load.delay=120 \
  -Dpage.load.delay=120 \
  -followRedirects -dumpText https://metais.slovensko.sk

Went through the source code of libselenium and selenium protocol plugins
with no success.

What else to try to get such page crawled?

Peter


Reply via email to