Dear Nutch users, [Nutch 1.13]
I am developing and app that needs to crawl and index images and in order to fetch dynamic content - like images in galleries - I started using protocol-selenium plugin. However, after initial success (though I needed to install a very outdated version of Firefox - 31.x) with a single URL in seed.txt, the crawler crashed when I tried to crawl multiple sites (a standard scenario in the app). This - of course - was the result of Nutch starting a queue for every different host and inability to open several Firefox instances with selenium in local mode. I tried to switch to Selenium grid, per: https://github.com/apache/nutch/tree/master/src/plugin/protocol-selenium I used selenium-server-standalone 3.4.0, however when I started the hub and started crawling, the* hub didn't register any attempts at connecting to it. I* think nutch-site.xml was properly configured, though I didn't set the grid.binary.location. I also tried upgrading the lib-selenium and the server, with little luck. I dis Does anyone know what is the issue here? Has anyone succeeded in configuring protocol-selenium grid and made it work with multiple URLs from different hosts in the seed.txt? Thanks in advance, Filip