What is the value of the configuration property selenium.driver?
Looks like it's "firefox" which requires that Firefox is installed.
When using a Selenium hub it should be "remote". Please check also
other selenium properties and set them accordingly in your nutch-site.xml.
All properties are listed in nutch-default.xml.

Thanks,
Sebastian

On 03/05/2018 10:40 AM, narendra singh arya wrote:
> FetcherThread 39 fetch of http://www.bbc.com/ failed with:
> java.lang.RuntimeException: org.openqa.selenium.WebDriverException: Failed
> to connect to binary
> FirefoxBinary(/Applications/Firefox.app/Contents/MacOS/firefox-bin) on port
> 7055; process output follows:
> 
> null
> 
> Build info: version: '2.48.2', revision:
> '41bccdd10cf2c0560f637404c2d96164b67d9d67', time: '2015-10-09 13:08:06'
> 
> System info: host: 'FDLMC488.local', ip: '192.168.63.89', os.name: 'Mac OS
> X', os.arch: 'x86_64', os.version: '10.11.6', java.version: '1.8.0_161'
> 
> Driver info: driver.version: FirefoxDriver
> 
> FetcherThread 39 has no more work available
> 
> I tried on a http website and this is the error.
> 
> On 5 March 2018 at 14:48, Yash Thenuan Thenuan <rit2014...@iiita.ac.in>
> wrote:
> 
>> Is there a way to fetch https websites using selenium?
>>
>> On 5 Mar 2018 14:10, "Sebastian Nagel" <wastl.na...@googlemail.com> wrote:
>>
>>>> What will happen if I try to crawl a https website.
>>>
>>> I didn't try it, but I would expect that
>>> - if except protocol-selenium no other protocol plugins are active:
>>>    fetching fails (as reported in NUTCH-2310)
>>> - if another protocol plugin is active which supports https:
>>>    Fetcher will uses it to fetch https content
>>>
>>>
>>> On 03/05/2018 09:35 AM, narendra singh arya wrote:
>>>> I am using the one you told.
>>>> Now my question is after specifying protocol-selenium as initial
>> Fetcher,
>>>> What will happen if I try to crawl a https website.
>>>> And what will happen if don't setup the selenium and try crawl a
>> website.
>>>> Because it's not throwing any error.
>>>>
>>>> On Mon, 5 Mar 2018, 13:59 Sebastian Nagel, <wastl.na...@googlemail.com
>>>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> it is not used as Fetcher but Fetcher will use it if it fetches
>> content
>>>>> via http.
>>>>> If not used at all, it's likely a configuration issue
>> (plugin.includes)
>>> or
>>>>> an unsupported protocol (that's true for https, see NUTCH-2310).
>>>>>
>>>>> Just to confirm: are you really using
>>>>>   https://github.com/momer/nutch-selenium-grid-plugin
>>>>> instead of protocol-selenium which is part of Nutch?
>>>>>
>>>>> Best,
>>>>> Sebastian
>>>>>
>>>>> On 03/05/2018 09:00 AM, narendra singh arya wrote:
>>>>>> How can I know that protocol-selinium is used as Fetcher. Because I
>>> don't
>>>>>> think after going through all the steps it is being used at all.
>>>>>>
>>>>>> On Fri, 2 Mar 2018, 18:28 narendra singh arya, <nsary...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>> I want to crawl ajax populated content using nutch.
>>>>>>> I tried this with selenium-grid-plugin on nutch 1.14.
>>>>>>> After following all the steps from github page
>>>>> nutch-selenium-grid-plugin
>>>>>>> I am not able to fetch the ajax loaded content.
>>>>>>> I have docker-selnium hub and node running on my mac.
>>>>>>> But I am still not able to fetch the ajax loaded content.
>>>>>>> Help regarding any version of nutch will be appreciated.
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 

Reply via email to