Well I think in order to address the problem directly it would be better to
focus on getting something working with a distribution of Nutch you are most
comfortable working with. For the time being I would avoid working with
trunk 2.0 unless you can justify otherwise. I would also either make a
decision between Nutch 1.2 and the current 1.3 release rather than focussing
on previous branches, which may or may not be stable depending on when you
last svn updated.

If you can try working with a fresh 1.2 or 1.3 (preferrably 1.3) then we
could maybe get to the bottom of this one as it would be great to find
whether there is scope to file a JIRA with this.

Thank you

On Tue, Jul 12, 2011 at 2:02 PM, Nutch User - 1 <nutch.use...@gmail.com>wrote:

> On 07/12/2011 03:42 PM, lewis john mcgibbney wrote:
> > Hi,
> >
> > An observation is that you are using the 1.3 branch, which will now
> contain
> > some older code. For example the fetcher class has now been upgraded to
> deal
> > with Nutch-962, which is mentioned at the top of the class as per your
> URL
> > example.
> >
> > Can anyone explain what the existing metadata being transferred is as per
> > below if it does not include the score as you state?
> >
> >         } else {
> >           CrawlDatum newDatum = new CrawlDatum(CrawlDatum.STATUS_LINKED,
> >               datum.getFetchInterval());
> >           // transfer existing metadata
> >           newDatum.getMetaData().putAll(datum.getMetaData());
> >           try {
> >             scfilters.initialScore(url, newDatum);
> >
> > I would have imagined that the metadata would have included the relative
> > initial score we are discussing if it were to be of use in attributing an
> > initial URLs metadata to a redirect?
> > Apart from this, with the addition of your datum.getScore(), do the new
> > scores attributed to the URL redirects  reflect accurately you're general
> > understanding of the web graph?
>
> I have only been dealing with Nutch 1.2 and 1.3. I tried to setup 2.0
> with Eclipse but failed as described here
> (http://lucene.472066.n3.nabble.com/TestFetcher-hangs-td3091057.html).
> The new scores were as they should have been in my opinion. (Even though
> I would state that Nutch's implementation of OPIC isn't exactly what the
> publication says.) I don't know what information is passed in metadata.
>



-- 
*Lewis*

Reply via email to