[jira] Commented: (NUTCH-835) document deduplication (exact duplicates) failed using MD5Signature

2010-07-01 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884540#action_12884540 ] Hudson commented on NUTCH-835: -- Integrated in Nutch-trunk #1195 (See [http://hudson.zones.apac

[jira] Created: (NUTCH-839) nutch doesnt run under 0.20.2+228-1~karmic-cdh3b1 version of hadoop

2010-07-01 Thread Robert Gonzalez (JIRA)
nutch doesnt run under 0.20.2+228-1~karmic-cdh3b1 version of hadoop --- Key: NUTCH-839 URL: https://issues.apache.org/jira/browse/NUTCH-839 Project: Nutch Issue Type: Bug

[jira] Commented: (NUTCH-831) Allow configuration of how fields crawled by Nutch are stored / indexed / tokenized

2010-07-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884300#action_12884300 ] Andrzej Bialecki commented on NUTCH-831: - In the future a maintenance patch like th

[jira] Commented: (NUTCH-831) Allow configuration of how fields crawled by Nutch are stored / indexed / tokenized

2010-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884302#action_12884302 ] Chris A. Mattmann commented on NUTCH-831: - Hey Andrzej, Exactly, I applied the patc

[jira] Updated: (NUTCH-831) Allow configuration of how fields crawled by Nutch are stored / indexed / tokenized

2010-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-831: Fix Version/s: 1.2 - applied to 1.2 branch > Allow configuration of how fields crawled by N

[jira] Work started: (NUTCH-838) Add timing information to all Tool classes

2010-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-838 started by Chris A. Mattmann. > Add timing information to all Tool classes > -- > > Key: NUTCH-83

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884296#action_12884296 ] Chris A. Mattmann commented on NUTCH-837: - hahah uh oh! I'll try and take a look be

[jira] Updated: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-837: Attachment: NUTCH-837.patch Warning - Nutch veterans may want to sit down before reading, be

[jira] Assigned: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki reassigned NUTCH-837: --- Assignee: Andrzej Bialecki > Remove search servers and Lucene dependencies > ---

[jira] Commented: (NUTCH-836) Remove deprecated parse plugins

2010-07-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884269#action_12884269 ] Andrzej Bialecki commented on NUTCH-836: - +1, let's move on with this issue then -

[jira] Updated: (NUTCH-835) document deduplication (exact duplicates) failed using MD5Signature

2010-07-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-835: Fix Version/s: 1.2 > document deduplication (exact duplicates) failed using MD5Signature > -

[jira] Resolved: (NUTCH-835) document deduplication (exact duplicates) failed using MD5Signature

2010-07-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-835. - Assignee: Andrzej Bialecki Fix Version/s: 2.0 Resolution: Fixed Fixed in

[jira] Commented: (NUTCH-835) document deduplication (exact duplicates) failed using MD5Signature

2010-07-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884255#action_12884255 ] Andrzej Bialecki commented on NUTCH-835: - Yes, this is a bug. In fact the implement

[jira] Commented: (NUTCH-836) Remove deprecated parse plugins

2010-07-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884253#action_12884253 ] Julien Nioche commented on NUTCH-836: - {quote} * do we still need lib-nekohtml ? {q

[jira] Commented: (NUTCH-836) Remove deprecated parse plugins

2010-07-01 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884252#action_12884252 ] Andrzej Bialecki commented on NUTCH-836: - Some comments: * do we still need lib-ne