Got it. I am going to document this on the wiki. Thanks.
Steve
-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 29, 2007 2:31 PM
To: nutch-dev@lucene.apache.org
Subject: Re: Sequence File Question
Steve Severance wrote:
>> DB update
Steve Severance wrote:
DB updates - or actually replacements - see e.g. CrawlDb.install()
method for details. This is not needed in case of segments, which
are created once and never updated.
How does the reader know which one it is expecting. For instance I
can make a reader to read a linkD
> -Original Message-
> From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 28, 2007 4:34 PM
> To: nutch-dev@lucene.apache.org
> Subject: Re: Sequence File Question
>
> Steve Severance wrote:
> > Let me actually refine that question we d
Steve Severance wrote:
Let me actually refine that question we do some directories like the linkdb
have a current, and why do others like parse_data not? Is there a convention
on this?
First, to answer your original question: you should use
MapFileOutputFormat class for reading such output. It
1 PM
> To: nutch-dev@lucene.apache.org
> Subject: Sequence File Question
>
> Hey guys,
> I have a mapreduce job that sets up a directory for pagerank. It
> iterates
> over all the segments and then outputs a MapFile containing the data.
> When I
> go to open the outputted
Hey guys,
I have a mapreduce job that sets up a directory for pagerank. It iterates
over all the segments and then outputs a MapFile containing the data. When I
go to open the outputted directory with another MapReduce job it fails
saying that it cannot find the path. The path that it thinks it is