Build failed in Jenkins: Nutch-trunk #1525

2011-06-23 Thread Apache Jenkins Server
See -- [...truncated 985 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection A src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java A

[Nutch Wiki] Trivial Update of "Archive and Legacy" by LewisJohnMcgibbney

2011-06-23 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Archive and Legacy" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Archive%20and%20Legacy?action=diff&rev1=9&rev2=10 === Internal Nutch Documentation

[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-23 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "FrontPage" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=188&rev2=189 * IndexStructure * [[Getting_Started]] * JavaDe

[Nutch Wiki] Trivial Update of "Presentations" by LewisJohnMcgibbney

2011-06-23 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Presentations" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Presentations?action=diff&rev1=14&rev2=15 Recent presentations: + * [[http://content.

[jira] [Created] (NUTCH-1012) Cannot handle illegal charset $charset

2011-06-23 Thread Markus Jelsma (JIRA)
Cannot handle illegal charset $charset -- Key: NUTCH-1012 URL: https://issues.apache.org/jira/browse/NUTCH-1012 Project: Nutch Issue Type: Bug Components: parser Affects Versions: 1.3

[jira] [Commented] (NUTCH-1000) Add option not to commit to Solr

2011-06-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054132#comment-13054132 ] Lewis John McGibbney commented on NUTCH-1000: - Hi Markus, I'm not on a work s

[jira] [Commented] (NUTCH-1011) Normalize duplicate slashes in URL's

2011-06-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053953#comment-13053953 ] Markus Jelsma commented on NUTCH-1011: -- Oh, it gets better. It seems the used engine

[jira] [Updated] (NUTCH-1011) Normalize duplicate slashes in URL's

2011-06-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1011: - Attachment: NUTCH-1011-all-3.patch HTML entities must be escaped properly! > Normalize duplicate

[jira] [Updated] (NUTCH-1011) Normalize duplicate slashes in URL's

2011-06-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1011: - Attachment: (was: NUTCH-1011-all.patch) > Normalize duplicate slashes in URL's >

[jira] [Updated] (NUTCH-1011) Normalize duplicate slashes in URL's

2011-06-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1011: - Attachment: (was: NUTCH-1011-all-2.patch) > Normalize duplicate slashes in URL's > --

[jira] [Issue Comment Edited] (NUTCH-1011) Normalize duplicate slashes in URL's

2011-06-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053937#comment-13053937 ] Markus Jelsma edited comment on NUTCH-1011 at 6/23/11 3:55 PM: -

[jira] [Updated] (NUTCH-1011) Normalize duplicate slashes in URL's

2011-06-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1011: - Attachment: NUTCH-1011-all-2.patch The previous regex seems to eat the character preceding the sl

[jira] [Updated] (NUTCH-1011) Normalize duplicate slashes in URL's

2011-06-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1011: - Summary: Normalize duplicate slashes in URL's (was: Remove double slashes) > Normalize duplicate

[jira] [Updated] (NUTCH-1011) Remove double slashes

2011-06-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1011: - Attachment: NUTCH-1011-all.patch Added new expression to detect double slashes that are _not_ par

[jira] [Created] (NUTCH-1011) Remove double slashes

2011-06-23 Thread Markus Jelsma (JIRA)
Remove double slashes - Key: NUTCH-1011 URL: https://issues.apache.org/jira/browse/NUTCH-1011 Project: Nutch Issue Type: Improvement Affects Versions: 1.4, 2.0 Reporter: Markus Jelsma Assignee: