Hi Michael

May be you can custom html parser plugin to parse javascript.



On Tue, Jan 15, 2013 at 6:43 PM, Tejas Patil <[email protected]>wrote:

> AFAIK, you cannot configure Fetcher to make use of firefox or htmlunit. You
> will perhaps have to change the nutch source by yourself.
>
> Thanks,
> Tejas Patil
>
>
> On Tue, Jan 15, 2013 at 12:02 AM, Michael Gang <[email protected]
> >wrote:
>
> > Hi,
> >
> > I understand.
> > Is there a way to use for a set of predefined pages another browser as
> > fetcher?
> > For example, would it be possible to say nutch that he should use firefox
> > or htmlunit as a fetcher?
> > There are many internet sites with ajax loads and where a click makes a
> > form submit, where no real html snippets exist.
> >
> > Thanks,
> > David
> >
> >
> > On Sun, Jan 13, 2013 at 8:08 PM, Lewis John Mcgibbney <
> > [email protected]> wrote:
> >
> > > This should be correct yes.
> > > If you look at the plugin source you can see the patterns it uses to
> > > extract links.
> > > Also you can check what's iyour crawldb using the readdb command
> > > Hth
> > > Lewis
> > >
> > > On Saturday, January 12, 2013, Michael Gang <[email protected]>
> > wrote:
> > > > Hi,
> > > >
> > > > So if there is a javascript which actually submits a form, nutch
> won't
> > > > follow the link, because it just deals with urls.
> > > > Is this correct?
> > > >
> > > > Thanks,
> > > > David
> > > >
> > > >
> > > > On Tue, Jan 8, 2013 at 5:15 PM, Michael Gang <[email protected]>
> > > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> From the features of nutch
> > > >> http://wiki.apache.org/nutch/Features
> > > >> i understand that there is a sort of javascript support
> > > >>
> > > >> JavaScript (for extracting links only?) (parse-js)
> > > >>
> > > >> I don't understand what this exactly means.
> > > >> Let's say if i have a link
> > > >> <a onclick="do_something">
> > > >> or a jquery binding in onready
> > > >> and in this code i open a new window and show there a result of a
> form
> > > >> submit
> > > >> will nutch extract for me the resulting page as link ?
> > > >>
> > > >> Thanks,
> > > >> David
> > > >>
> > > >>
> > > >
> > >
> > > --
> > > *Lewis*
> > >
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to