[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-706:
--
Fix Version/s: 2.2
Summary: Url regex normalizer: default pattern for session id remova
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-706:
--
Attachment: NUTCH-706-2.patch
Second trial for patch. The first one does not remove:
{code}
?_se
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-706:
--
Attachment: NUTCH-706.patch
- fix the pattern by adding an anchor prohibiting inner-word matches
[
https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-706:
Fix Version/s: 1.6
> Url regex normalizer
>
>
> Key: NUTCH-
4 matches
Mail list logo