[jira] [Updated] (NUTCH-706) Url regex normalizer: default pattern for session id removal not to match "newsId"

2012-10-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-706: -- Fix Version/s: 2.2 Summary: Url regex normalizer: default pattern for session id remova

[jira] [Updated] (NUTCH-706) Url regex normalizer

2012-08-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-706: -- Attachment: NUTCH-706-2.patch Second trial for patch. The first one does not remove: {code} ?_se

[jira] [Updated] (NUTCH-706) Url regex normalizer

2012-07-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-706: -- Attachment: NUTCH-706.patch - fix the pattern by adding an anchor prohibiting inner-word matches

[jira] [Updated] (NUTCH-706) Url regex normalizer

2012-07-10 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-706: Fix Version/s: 1.6 > Url regex normalizer > > > Key: NUTCH-