Re: nutch-selenium help

2016-04-12 Thread Sabah Sajjad Khan
Don't even know how I missed that thanks for your help! Thank You > On Apr 13, 2016, at 1:38 AM, Mattmann, Chris A (3980) > wrote: > > the wiki says the work has been ported to NUTCH-1933, which has already > been committed to trunk. HTH. > > +

Re: nutch-selenium help

2016-04-12 Thread Mattmann, Chris A (3980)
the wiki says the work has been ported to NUTCH-1933, which has already been committed to trunk. HTH. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory P

Re: nutch-selenium help

2016-04-12 Thread Sabah Sajjad Khan
This is the wiki page i have used https://wiki.apache.org/nutch/AdvancedAjaxInteraction > On Apr 13, 2016, at 1:30 AM, Mattmann, Chris A (3980) > wrote: > > Hi, the plugin is now part of Nutch, so you don’t need to use the > Github one and can you show me the wiki page by linking to it since >

Re: nutch-selenium help

2016-04-12 Thread Mattmann, Chris A (3980)
Hi, the plugin is now part of Nutch, so you don’t need to use the Github one and can you show me the wiki page by linking to it since it’s likely out of date.. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science D

Re: nutch-selenium help

2016-04-12 Thread Sabah Sajjad Khan
The link that i provided is the same as the one on the wiki page. > On Apr 13, 2016, at 1:13 AM, Mattmann, Chris A (3980) > wrote: > > Please use the selenium plugin that is part of Nutch and described > on the wiki in the Advanced Ajax Interaction section. > >

Re: nutch-selenium help

2016-04-12 Thread Mattmann, Chris A (3980)
Please use the selenium plugin that is part of Nutch and described on the wiki in the Advanced Ajax Interaction section. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Prop

nutch-selenium help

2016-04-12 Thread Sabah Sajjad Khan
Hello, I am very new to nutch and am having issues crawling to receive the content that i need. i am crawling electronic part websites to see prices but when using readdb to dump i don't see all the data under content. I have attached the dump file. My setup is nutch with selenium using this

HTTPS Problem even using httpclient

2016-04-12 Thread Bin Wang
Hi there, I am testing Nutch against a blog. https://datafireball.com/ I added the link to the seed.txt and left the regex-urlfilter the way it is. I replaced protocol-http with protocol-httpclient and thought that will make it capable of fetching https links. However, it failed with the followin

Adding a new field to Nutch + MongoDB datastore using plugin

2016-04-12 Thread jvence
I am running Nutch 2.3.1 configured with MondoDB (using Gora) + Elasticsearch and would like to add a new field to the storage database NOT the index. I am able to add a field to the elasticsearch index using a custom plugin but would like to add it to the mongodb record for each website. I've ad