[ https://issues.apache.org/jira/browse/NUTCH-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119281#comment-13119281 ]
Markus Jelsma commented on NUTCH-1143: -------------------------------------- It seems the anchor field was once used for indexing the best ranking anchor for a given URL but the indexing code is legacy. With the current version users must invert links and pass the linkdb and enable index-anchor to index anchors so having an anchor in LinkDatum is obsolete for now. Instead of completely removing the anchor code we should make it optional, by doing that we can write indexing code later and pass the webgraph to the indexer instead of a linkdb. I opt for defaulting the setting to false (i.e. do not store anchors) since they are unusable at the moment. > Omit anchor in webgraph's LinkDatum > ----------------------------------- > > Key: NUTCH-1143 > URL: https://issues.apache.org/jira/browse/NUTCH-1143 > Project: Nutch > Issue Type: Improvement > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Priority: Minor > Fix For: 1.5 > > > Anchors are stored unchecked in the webgraph. it looks like for cosmetic > reasons only. When dealing with hundreds of millions of records it takes up > significant space and I/O time. > This issue should add an option to omit the anchor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira