Hi Peter,
the best description for the Selenium plugin is the README.md [1].
Otherwise, could you share which Selenium driver is used?
Thanks,
Sebastian
[1]
https://github.com/apache/nutch/blob/master/src/plugin/protocol-selenium/README.md
On 12/17/24 21:07, Peter Viskup wrote:
Just not able to get it working...
At first I got selenium timeout exception even
with libselenium.page.load.delay set. The solution was to increase the
value of page.load.delay which was default of 3.
Then I stucked with the output of Selenium which shows "You need to enable
JavaScript".
Am running the nutch with command:
./bin/nutch parsechecker -Dplugin.includes='protocol-selenium|parse-tika' \
-Dselenium.enable.headless=true \
-Dlibselenium.page.load.delay=120 \
-Dpage.load.delay=120 \
-followRedirects -dumpText https://metais.slovensko.sk
Went through the source code of libselenium and selenium protocol plugins
with no success.
What else to try to get such page crawled?
Peter