[
https://issues.apache.org/jira/browse/NUTCH-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12659610#action_12659610
]
Otis Gospodnetic commented on NUTCH-171:
But does generate.update.crawldb=true
[
https://issues.apache.org/jira/browse/NUTCH-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12659616#action_12659616
]
Andrzej Bialecki commented on NUTCH-171:
-
It's true this is not the same. However,
[
https://issues.apache.org/jira/browse/NUTCH-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12659639#action_12659639
]
Otis Gospodnetic commented on NUTCH-171:
Hm, yes, it's nice to be able to
[
https://issues.apache.org/jira/browse/NUTCH-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12659650#action_12659650
]
Rod Taylor commented on NUTCH-171:
--
I'm not interested in chasing this down. We have worked
[
https://issues.apache.org/jira/browse/NUTCH-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12659341#action_12659341
]
Andrzej Bialecki commented on NUTCH-171:
-
Recent versions of Nutch offer an option
[
http://issues.apache.org/jira/browse/NUTCH-171?page=comments#action_12372556 ]
Doug Cutting commented on NUTCH-171:
Ideally we could overlap segment2 map with segment1 reduce to keep bandwidth
usage constant.
Overlapping map2 with reduce1 should
[
http://issues.apache.org/jira/browse/NUTCH-171?page=comments#action_12372588 ]
Rod Taylor commented on NUTCH-171:
--
One thing that's needed is the ability to mark urls as being fetched, which
was in 0.7 but has not yet made it into 0.8. In addition, we
[
http://issues.apache.org/jira/browse/NUTCH-171?page=comments#action_12372597 ]
Doug Cutting commented on NUTCH-171:
Generate for 20 Segments of 10M in size is almost as fast as 1 segment that
is 10M in size. A single 200M URL segment is unweildly
[
http://issues.apache.org/jira/browse/NUTCH-171?page=comments#action_12372602 ]
Rod Taylor commented on NUTCH-171:
--
How is a 200M url segment unweildy?
There are two reasons why I have found this. First, Nutch still has a bad habit
of not completing a
[
http://issues.apache.org/jira/browse/NUTCH-171?page=comments#action_12362507 ]
Doug Cutting commented on NUTCH-171:
I'd like to hear more about why you want multiple segments, what's motivating
this patch. The 0.7 -numFetchers parameter was designed
10 matches
Mail list logo