[ https://issues.apache.org/jira/browse/NUTCH-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma resolved NUTCH-1616. ---------------------------------- Resolution: Duplicate Ah, i finally realize this issue is a exact duplicate of NUTCH-1113, my patch is almost identical but flawed. Where i used || it should have been &&. We have been using my patch for some time in production but i found out that something is still wrong! I debugged it to this patch, changed it and checked the related issues. Let's close this issue in favor of NUTCH-1113. > SegmentMerger missing proper crawl_fetch datum > ---------------------------------------------- > > Key: NUTCH-1616 > URL: https://issues.apache.org/jira/browse/NUTCH-1616 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.7 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Priority: Critical > Fix For: 1.8 > > Attachments: NUTCH-1616-1.8.patch > > > Merged 26036 vs. unmerged 26038 indexed documents! There are two records on > the merged segment that no longer have a crawl_fetch CrawlDatum with a > fetch_success status. Instead, the only crawl_fetch CrawlDatum has status > linked! > The original segment two crawl_fetch CrawlDatums with linked and the > fetch_success status. > Without the fetch_success of not_modified status it is not going to be > indexed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)