Hi lewis,

I also found that there is something wrong in the DBUpdaterReducer. See
below code block:
    if (page.getInlinks() != null) {
      page.getInlinks().clear();
    }
    for (ScoreDatum inlink : inlinkedScoreData) {
      page.putToInlinks(new Utf8(inlink.getUrl()), new
Utf8(inlink.getAnchor()));
    }

    // Distance calculation.
    // Retrieve smallest distance from all inlinks distances
    // Calculate new distance for current page: smallest inlink distance
plus 1.
    // If the new distance is smaller than old one (or if old did not exist
yet),
    // write it to the page.
    int smallestDist=Integer.MAX_VALUE;
    for (ScoreDatum inlink : inlinkedScoreData) {
      int inlinkDist = inlink.getDistance();
      if (inlinkDist < smallestDist) {
        smallestDist=inlinkDist;
      }
      page.putToInlinks(new Utf8(inlink.getUrl()), new
Utf8(inlink.getAnchor()));
    }

This sentence 'page.putToInlinks(new Utf8(inlink.getUrl()), new
Utf8(inlink.getAnchor()));' is invoked twice. When I tried to remove the
second one, in my case inbound links are back. In fact, I think the second
one is redundant and it seems to bring this bug.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-2-2-1-missing-inbound-link-when-using-HBase-tp4111216p4112656.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to