Just as a follow-up: my problem is akin to this one;
https://stackoverflow.com/questions/43786034/nutch-selenium-firefox-issue-unable-to-connect-to-host-127-0-0-1-on-port-7055-a
,perhaps described more precisely.

2017-07-14 14:13 GMT+02:00 Filip Stysiak <stysiak.fi...@gmail.com>:

> Dear Nutch users,
>
> [Nutch 1.13]
>
> I am developing and app that needs to crawl and index images and in order
> to fetch dynamic content - like images in galleries - I started using
> protocol-selenium plugin. However, after initial success (though I needed
> to install a very outdated version of Firefox - 31.x) with a single URL in
> seed.txt, the crawler crashed when I tried to crawl multiple sites (a
> standard scenario in the app).
>
> This - of course - was the result of Nutch starting a queue for every
> different host and inability to open several Firefox instances with
> selenium in local mode.
>
> I tried to switch to Selenium grid, per:
> https://github.com/apache/nutch/tree/master/src/plugin/protocol-selenium
>
> I used selenium-server-standalone 3.4.0, however when I started the hub
> and started crawling, the* hub didn't register any attempts at connecting
> to it. I* think nutch-site.xml was properly configured, though I didn't
> set the grid.binary.location. I also tried upgrading the lib-selenium and
> the server, with little luck. I dis
>
> Does anyone know what is the issue here? Has anyone succeeded in
> configuring protocol-selenium grid and made it work with multiple URLs from
> different hosts in the seed.txt?
>
> Thanks in advance,
> Filip
>

Reply via email to