[jira] [Commented] (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-04-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017405#comment-13017405 ] Markus Jelsma commented on NUTCH-963: - Yes! > Add support for deleting Solr documents

[jira] [Commented] (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-04-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017403#comment-13017403 ] Julien Nioche commented on NUTCH-963: - Shall we create a new issue to track the progres

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008469#comment-13008469 ] Markus Jelsma commented on NUTCH-963: - Committed for branch-1.3 in rev 1082944. - new c

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008421#comment-13008421 ] Markus Jelsma commented on NUTCH-963: - Solr deduplication makes its own (fuzzy) hashes

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-03-18 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008410#comment-13008410 ] Julien Nioche commented on NUTCH-963: - Re-dedup on SOLR side : good point, although the

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008402#comment-13008402 ] Markus Jelsma commented on NUTCH-963: - Julien, shouldn't the deduplicate mechanism kept

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-31 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988722#comment-12988722 ] Julien Nioche commented on NUTCH-963: - {quote} @Julien: you mean to use the signature o

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Claudio Martella (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987652#action_12987652 ] Claudio Martella commented on NUTCH-963: there's a little problem in where you put t

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Claudio Martella (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987617#action_12987617 ] Claudio Martella commented on NUTCH-963: @Markus: about the commit, i did also consi

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987574#action_12987574 ] Julien Nioche commented on NUTCH-963: - It would be nice to couple that with the deduplic

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987559#action_12987559 ] Markus Jelsma commented on NUTCH-963: - The class works fine although i did add a commit

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987132#action_12987132 ] Markus Jelsma commented on NUTCH-963: - Thanks Claudio. I'll fix the formatting and add a