[ https://issues.apache.org/jira/browse/NUTCH-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche updated NUTCH-850: -------------------------------- Attachment: NUTCH-850.patch > SolrDeleteDuplicates needs to clone the SolrRecord objects > ----------------------------------------------------------- > > Key: NUTCH-850 > URL: https://issues.apache.org/jira/browse/NUTCH-850 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.1 > Reporter: Julien Nioche > Assignee: Julien Nioche > Fix For: 1.2, 2.0 > > Attachments: NUTCH-850.patch > > > The reduce() method of SolrDeleteDuplicates deduplicates SOLRRecords given > their signature. The first SOLRRecord is stored in a variable _recordToKeep_ > and is compared to the following SOLRRecords found with the same signature. > The only trouble being that the first instance is reused by Hadoop when > calling values.next() and hence _recordToKeep_ gets the same values as the > latest call to values.next(). > The patch attached clones the SOLRRecord before assigning them to > _recordToKeep_ in order to avoid the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.