A parsing fetcher does everything in the mapper. Please check the output() method around line 1012 onwards:
http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java?view=markup Parsing, signature, outlink processing (using code in ParseOutputFormat) all happens there. Cheers, Markus -----Original message----- > From:Weilei Zhang <zhan...@gmail.com> > Sent: Sat 09-Feb-2013 23:40 > To: user@nutch.apache.org > Subject: Re: performance question: fetcher and parser in separate map/reduce > jobs? > > This is indeed helpful. Thanks Lewis. > > On Wed, Feb 6, 2013 at 6:50 PM, Lewis John Mcgibbney > <lewis.mcgibb...@gmail.com> wrote: > > I've eventually added this to our FAQ's > > > > http://wiki.apache.org/nutch/FAQ#Can_I_parse_during_the_fetching_process.3F > > > > This should explain for you. > > Lewis > > > > On Wed, Feb 6, 2013 at 6:31 PM, Weilei Zhang <zhan...@gmail.com> wrote: > > > >> Hi > >> I have a performance question: > >> why fetcher and parser is staged in two separate jobs instead of one? > >> Intuitively, parser can be included as a part of fetcher reducer, is > >> it? This seems to be more efficient. > >> Thanks > >> -- > >> Best Regards > >> -Weilei > >> > > > > > > > > -- > > *Lewis* > > > > -- > Best Regards > -Weilei >