[
https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Radim Kolar updated NUTCH-1098:
---
Attachment: nutch.diff
Updated patch. It also normalizes unprintable % sequences to upper case. Like
[
https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Radim Kolar updated NUTCH-1098:
---
Attachment: (was: urlnormalizer.patch)
better url-normalizer basic
---
[
https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091740#comment-13091740
]
Radim Kolar commented on NUTCH-937:
---
we should stick with hadoop 0.20.203.0 not CDH and
better url-normalizer basic
---
Key: NUTCH-1098
URL: https://issues.apache.org/jira/browse/NUTCH-1098
Project: Nutch
Issue Type: Improvement
Components: fetcher
Affects Versions: 1.3
[
https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Radim Kolar updated NUTCH-1098:
---
Attachment: urlnormalizer.patch
Patch against branch-1.4
better url-normalizer basic
[
https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088443#comment-13088443
]
Radim Kolar commented on NUTCH-990:
---
I have this problem too protocol-httpclient fails
6 matches
Mail list logo