Please use the selenium plugin that is part of Nutch and described
on the wiki in the Advanced Ajax Interaction section.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++










On 4/12/16, 9:38 PM, "Sabah Sajjad Khan" <sabah.kh...@wayne.edu> wrote:

>Hello,
>
>
>I am very new to nutch and am having issues crawling to receive the content 
>that i need. i am crawling electronic part websites to see prices but when 
>using readdb to dump i don't see all the data under content. I have attached 
>the dump file.
>
>
>
>
>My setup is nutch with selenium using this link 
>https://github.com/momer/nutch-selenium 
><https://github.com/momer/nutch-selenium> but i don't use the last 
>command(bin/crawl) because i am not using solr. selenium seems to be working 
>as well as the headless browser but it just doesn't seem to extract any data. 
>any help would be appreciated. Like
> i said i'm very new so if there is any other information i could provide to 
> help understand my problem let me know or let me know how i could track my 
> problem.
>
>
>Thank you in advance.
>
>

Reply via email to