A parsing fetcher does everything in the mapper. Please check the output() 
method around line 1012 onwards:

http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java?view=markup

Parsing, signature, outlink processing (using code in ParseOutputFormat) all 
happens there.

Cheers,
Markus
 
 
-----Original message-----
> From:Weilei Zhang <zhan...@gmail.com>
> Sent: Sat 09-Feb-2013 23:40
> To: user@nutch.apache.org
> Subject: Re: performance question: fetcher and parser in separate map/reduce 
> jobs?
> 
> This is indeed helpful. Thanks Lewis.
> 
> On Wed, Feb 6, 2013 at 6:50 PM, Lewis John Mcgibbney
> <lewis.mcgibb...@gmail.com> wrote:
> > I've eventually added this to our FAQ's
> >
> > http://wiki.apache.org/nutch/FAQ#Can_I_parse_during_the_fetching_process.3F
> >
> > This should explain for you.
> > Lewis
> >
> > On Wed, Feb 6, 2013 at 6:31 PM, Weilei Zhang <zhan...@gmail.com> wrote:
> >
> >> Hi
> >> I have a performance question:
> >> why fetcher and parser is staged in two separate jobs instead of one?
> >> Intuitively, parser can be included as a part of fetcher reducer,  is
> >> it? This seems to be more efficient.
> >> Thanks
> >> --
> >> Best Regards
> >> -Weilei
> >>
> >
> >
> >
> > --
> > *Lewis*
> 
> 
> 
> -- 
> Best Regards
> -Weilei
> 

Reply via email to