[
https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1142:
-
Attachment: NUTCH-1142-1.5-3.patch
New patch with the ability to normalize and filter existing Li
[
https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1142:
-
Attachment: NUTCH-1142-1.5-2.patch
New patch also filters collected outlinks instead of just map
[
https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1142:
-
Attachment: NUTCH-1142-1.4.patch
Here's a patch for trunk.
> Normalization and
[
https://issues.apache.org/jira/browse/NUTCH-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1142:
-
Description:
The WebGraph programs performs URL normalization. Since normalization of
outlinks i
4 matches
Mail list logo