[jira] [Updated] (NUTCH-1786) CrawlDb should follow db.url.normalizers and db.url.filters

2014-05-21 Thread Diaa (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diaa updated NUTCH-1786: Attachment: crawldbPatch.patch nutch-default.xml.patch Patch nutch-default.xml, crawldb.java and cr

[jira] [Created] (NUTCH-1786) CrawlDb should follow db.url.normalizers and db.url.filters

2014-05-21 Thread Diaa (JIRA)
Diaa created NUTCH-1786: --- Summary: CrawlDb should follow db.url.normalizers and db.url.filters Key: NUTCH-1786 URL: https://issues.apache.org/jira/browse/NUTCH-1786 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-1776) Log incorrect plugin.folder file path

2014-05-21 Thread Diaa (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005158#comment-14005158 ] Diaa commented on NUTCH-1776: - +1 for your patch. Yes isDirectory doesn't cover relative paths

[jira] [Commented] (NUTCH-1785) Ability to index raw content

2014-05-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005154#comment-14005154 ] Julien Nioche commented on NUTCH-1785: -- Hi Your patch contains [NUTCH-1758] but miss

[jira] [Resolved] (NUTCH-1757) ParserChecker to take custom metadata as input

2014-05-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1757. -- Resolution: Fixed trunk Committed revision 1596662. Thanks Markus for reviewing it! > ParserC

Re: Nutch survey

2014-05-21 Thread Markus Jelsma
Great! Done! :-)Julien Nioche schreef:Hi everyone! I had written a survey about Nutch and its uses and would be very grateful if you could take a couple of minutes to contribute : https://docs.google.com/forms/d/15Jg7dGoU2I1aHur3g5ia9qshCMES8hB1OLpf5q6sGXg/viewform This should help getting a c

Nutch survey

2014-05-21 Thread Julien Nioche
Hi everyone! I had written a survey about Nutch and its uses and would be very grateful if you could take a couple of minutes to contribute : https://docs.google.com/forms/d/15Jg7dGoU2I1aHur3g5ia9qshCMES8hB1OLpf5q6sGXg/viewform This should help getting a clearer picture of the wider Nutch commun

[jira] [Updated] (NUTCH-1785) Ability to index raw content

2014-05-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1785: - Attachment: NUTCH-1785-trunk.patch Patch for trunk. Prepare a field name raw_content and add -rea

[jira] [Created] (NUTCH-1785) Ability to index raw content

2014-05-21 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-1785: Summary: Ability to index raw content Key: NUTCH-1785 URL: https://issues.apache.org/jira/browse/NUTCH-1785 Project: Nutch Issue Type: New Feature

[jira] [Commented] (NUTCH-1757) ParserChecker to take custom metadata as input

2014-05-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004701#comment-14004701 ] Markus Jelsma commented on NUTCH-1757: -- Ah, this makes sense now! Indeed unfortunatel

[jira] [Commented] (NUTCH-1486) Upgrade to the latest Solr 4.x

2014-05-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004623#comment-14004623 ] Markus Jelsma commented on NUTCH-1486: -- Anyone to take a look at this? Bright hints?

[jira] [Commented] (NUTCH-1746) OutOfMemoryError in Mappers

2014-05-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004545#comment-14004545 ] Julien Nioche commented on NUTCH-1746: -- Hi Greg Thanks for investigating this. Your

[jira] [Updated] (NUTCH-1746) OutOfMemoryError in Mappers

2014-05-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1746: - Fix Version/s: 1.9 > OutOfMemoryError in Mappers > --- > >

[jira] [Commented] (NUTCH-1757) ParserChecker to take custom metadata as input

2014-05-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004520#comment-14004520 ] Julien Nioche commented on NUTCH-1757: -- Hi Markus bq. metadata is passed via CrawlDa

[jira] [Updated] (NUTCH-1757) ParserChecker to take custom metadata as input

2014-05-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1757: - Attachment: NUTCH-1757.patch.v2 > ParserChecker to take custom metadata as input > --

[jira] [Commented] (NUTCH-1758) IndexChecker to send document to IndexWriters

2014-05-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004447#comment-14004447 ] Julien Nioche commented on NUTCH-1758: -- Thanks Markus. Indeed, I did not think about