[
https://issues.apache.org/jira/browse/NUTCH-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Diaa updated NUTCH-1786:
Attachment: crawldbPatch.patch
nutch-default.xml.patch
Patch nutch-default.xml, crawldb.java and cr
Diaa created NUTCH-1786:
---
Summary: CrawlDb should follow db.url.normalizers and
db.url.filters
Key: NUTCH-1786
URL: https://issues.apache.org/jira/browse/NUTCH-1786
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005158#comment-14005158
]
Diaa commented on NUTCH-1776:
-
+1 for your patch. Yes isDirectory doesn't cover relative paths
[
https://issues.apache.org/jira/browse/NUTCH-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005154#comment-14005154
]
Julien Nioche commented on NUTCH-1785:
--
Hi
Your patch contains [NUTCH-1758] but miss
[
https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1757.
--
Resolution: Fixed
trunk Committed revision 1596662.
Thanks Markus for reviewing it!
> ParserC
Great! Done! :-)Julien Nioche schreef:Hi
everyone!
I had written a survey about Nutch and its uses and would be very grateful
if you could take a couple of minutes to contribute :
https://docs.google.com/forms/d/15Jg7dGoU2I1aHur3g5ia9qshCMES8hB1OLpf5q6sGXg/viewform
This should help getting a c
Hi everyone!
I had written a survey about Nutch and its uses and would be very grateful
if you could take a couple of minutes to contribute :
https://docs.google.com/forms/d/15Jg7dGoU2I1aHur3g5ia9qshCMES8hB1OLpf5q6sGXg/viewform
This should help getting a clearer picture of the wider Nutch commun
[
https://issues.apache.org/jira/browse/NUTCH-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1785:
-
Attachment: NUTCH-1785-trunk.patch
Patch for trunk. Prepare a field name raw_content and add -rea
Markus Jelsma created NUTCH-1785:
Summary: Ability to index raw content
Key: NUTCH-1785
URL: https://issues.apache.org/jira/browse/NUTCH-1785
Project: Nutch
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004701#comment-14004701
]
Markus Jelsma commented on NUTCH-1757:
--
Ah, this makes sense now! Indeed unfortunatel
[
https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004623#comment-14004623
]
Markus Jelsma commented on NUTCH-1486:
--
Anyone to take a look at this? Bright hints?
[
https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004545#comment-14004545
]
Julien Nioche commented on NUTCH-1746:
--
Hi Greg
Thanks for investigating this. Your
[
https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1746:
-
Fix Version/s: 1.9
> OutOfMemoryError in Mappers
> ---
>
>
[
https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004520#comment-14004520
]
Julien Nioche commented on NUTCH-1757:
--
Hi Markus
bq. metadata is passed via CrawlDa
[
https://issues.apache.org/jira/browse/NUTCH-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1757:
-
Attachment: NUTCH-1757.patch.v2
> ParserChecker to take custom metadata as input
> --
[
https://issues.apache.org/jira/browse/NUTCH-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004447#comment-14004447
]
Julien Nioche commented on NUTCH-1758:
--
Thanks Markus. Indeed, I did not think about
16 matches
Mail list logo