[jira] [Created] (NUTCH-1666) Optimisation for BasicURLNormalizer

2013-11-11 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1666: Summary: Optimisation for BasicURLNormalizer Key: NUTCH-1666 URL: https://issues.apache.org/jira/browse/NUTCH-1666 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1666) Optimisation for BasicURLNormalizer

2013-11-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1666: - Attachment: NUTCH-1666.patch > Optimisation for BasicURLNormalizer >

[jira] [Commented] (NUTCH-1666) Optimisation for BasicURLNormalizer

2013-11-11 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818817#comment-13818817 ] Markus Jelsma commented on NUTCH-1666: -- +1 > Optimisation for BasicURLNormalizer > -

[jira] [Resolved] (NUTCH-1666) Optimisation for BasicURLNormalizer

2013-11-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1666. -- Resolution: Fixed Committed revision 1540654. Thanks Markus! > Optimisation for BasicURLNorma

[jira] [Resolved] (NUTCH-1402) Create AbstractScoringFilter

2013-11-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1402. -- Resolution: Duplicate > Create AbstractScoringFilter > - > >

[jira] [Commented] (NUTCH-1324) DupeDB for Nutch

2013-11-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819041#comment-13819041 ] Julien Nioche commented on NUTCH-1324: -- Can't we achieve the same thing using the new

[jira] [Commented] (NUTCH-656) DeleteDuplicates based on crawlDB only

2013-11-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819042#comment-13819042 ] Julien Nioche commented on NUTCH-656: - [~wastl-nagel] [~markus17] any chance you could

[jira] [Resolved] (NUTCH-1100) SolrDedup broken

2013-11-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1100. -- Resolution: Fixed Committed revision 1540758. We'll probably move to a more generic approach i

Partial document update Solr

2013-11-11 Thread erik rombouts
Hi all, I found the following message on the dev mailinglist - http://www.mail-archive.com/dev@nutch.apache.org/msg10843.html - with a "patch" to allow Solr partial document updates via SolrWriter. There was a request from Julien to add this to JIRA but i cannot seem to find it. Does anybody know

RE: Partial document update Solr

2013-11-11 Thread Markus Jelsma
Hi Erik, That's pretty straightforward to implement. I did it for our site search platform to ease on transport costs, Lucene analysis and I/O in general. But it was quite useless when i found out i didn't read the docs that well and had to mark every field as stored. Solr needs every field to

Re: Partial document update Solr

2013-11-11 Thread erik rombouts
Hi Markus, Thanks for the quick and thorough answer. I did read that all the fields need to be stored for that yes, so i agree that in many cases this is not really the preferred solution. I am thinking about setting up a separate core in Solr with only a handful of fields and even less fields whi

[jira] [Commented] (NUTCH-1100) SolrDedup broken

2013-11-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819658#comment-13819658 ] Hudson commented on NUTCH-1100: --- SUCCESS: Integrated in Nutch-trunk #2419 (See [https://bui

[jira] [Commented] (NUTCH-1666) Optimisation for BasicURLNormalizer

2013-11-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819657#comment-13819657 ] Hudson commented on NUTCH-1666: --- SUCCESS: Integrated in Nutch-trunk #2419 (See [https://bui