[
https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511985
]
Hudson commented on NUTCH-505:
--
Integrated in Nutch-Nightly #147 (See
[
https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512071
]
Espen Amble Kolstad commented on NUTCH-505:
---
Automaton (http://www.brics.dk/automaton/), used in
[
https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512074
]
Doğacan Güney commented on NUTCH-505:
-
Thanks for the suggestion. Automaton really looks good, but using
[
https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512139
]
Andrzej Bialecki commented on NUTCH-505:
-
Please test Java 1.5 and Java 1.6 - IIRC there are some
[
https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512201
]
Doğacan Güney commented on NUTCH-505:
-
Andrzej, on my tests, java.util.regex is faster on both Java 1.5 and Java
[
https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511447
]
Andrzej Bialecki commented on NUTCH-505:
-
* In ParseOutputFormat, the calculation of outlinksToStore should
, 2007 1:09:26 AM
Subject: [jira] Commented: (NUTCH-505) Outlink urls should be validated
[
https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507803
]
Doğacan Güney commented on NUTCH-505
[
https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507803
]
Doğacan Güney commented on NUTCH-505:
-
btw, for http://www.variety.com/, these are the 'urls' filtered:
http:/