What is the value of the configuration property selenium.driver? Looks like it's "firefox" which requires that Firefox is installed. When using a Selenium hub it should be "remote". Please check also other selenium properties and set them accordingly in your nutch-site.xml. All properties are listed in nutch-default.xml.
Thanks, Sebastian On 03/05/2018 10:40 AM, narendra singh arya wrote: > FetcherThread 39 fetch of http://www.bbc.com/ failed with: > java.lang.RuntimeException: org.openqa.selenium.WebDriverException: Failed > to connect to binary > FirefoxBinary(/Applications/Firefox.app/Contents/MacOS/firefox-bin) on port > 7055; process output follows: > > null > > Build info: version: '2.48.2', revision: > '41bccdd10cf2c0560f637404c2d96164b67d9d67', time: '2015-10-09 13:08:06' > > System info: host: 'FDLMC488.local', ip: '192.168.63.89', os.name: 'Mac OS > X', os.arch: 'x86_64', os.version: '10.11.6', java.version: '1.8.0_161' > > Driver info: driver.version: FirefoxDriver > > FetcherThread 39 has no more work available > > I tried on a http website and this is the error. > > On 5 March 2018 at 14:48, Yash Thenuan Thenuan <rit2014...@iiita.ac.in> > wrote: > >> Is there a way to fetch https websites using selenium? >> >> On 5 Mar 2018 14:10, "Sebastian Nagel" <wastl.na...@googlemail.com> wrote: >> >>>> What will happen if I try to crawl a https website. >>> >>> I didn't try it, but I would expect that >>> - if except protocol-selenium no other protocol plugins are active: >>> fetching fails (as reported in NUTCH-2310) >>> - if another protocol plugin is active which supports https: >>> Fetcher will uses it to fetch https content >>> >>> >>> On 03/05/2018 09:35 AM, narendra singh arya wrote: >>>> I am using the one you told. >>>> Now my question is after specifying protocol-selenium as initial >> Fetcher, >>>> What will happen if I try to crawl a https website. >>>> And what will happen if don't setup the selenium and try crawl a >> website. >>>> Because it's not throwing any error. >>>> >>>> On Mon, 5 Mar 2018, 13:59 Sebastian Nagel, <wastl.na...@googlemail.com >>> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> it is not used as Fetcher but Fetcher will use it if it fetches >> content >>>>> via http. >>>>> If not used at all, it's likely a configuration issue >> (plugin.includes) >>> or >>>>> an unsupported protocol (that's true for https, see NUTCH-2310). >>>>> >>>>> Just to confirm: are you really using >>>>> https://github.com/momer/nutch-selenium-grid-plugin >>>>> instead of protocol-selenium which is part of Nutch? >>>>> >>>>> Best, >>>>> Sebastian >>>>> >>>>> On 03/05/2018 09:00 AM, narendra singh arya wrote: >>>>>> How can I know that protocol-selinium is used as Fetcher. Because I >>> don't >>>>>> think after going through all the steps it is being used at all. >>>>>> >>>>>> On Fri, 2 Mar 2018, 18:28 narendra singh arya, <nsary...@gmail.com> >>>>> wrote: >>>>>> >>>>>>> I want to crawl ajax populated content using nutch. >>>>>>> I tried this with selenium-grid-plugin on nutch 1.14. >>>>>>> After following all the steps from github page >>>>> nutch-selenium-grid-plugin >>>>>>> I am not able to fetch the ajax loaded content. >>>>>>> I have docker-selnium hub and node running on my mac. >>>>>>> But I am still not able to fetch the ajax loaded content. >>>>>>> Help regarding any version of nutch will be appreciated. >>>>>>> Thanks >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> >