[jira] Commented: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement

2007-03-21 Thread Michael Gillis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483003 ] Michael Gillis commented on NUTCH-246: -- I don't see any commit on this issue -- is it actually fixed in 0.9? It l

[jira] Commented: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement

2006-04-12 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-246?page=comments#action_12374279 ] Doug Cutting commented on NUTCH-246: Looks good, although I think I'd put the setFetchTime in the mapper, where the CrawlDatum is constructed, rather than in the reducer.

[jira] Commented: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement

2006-04-12 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-246?page=comments#action_12374272 ] Doug Cutting commented on NUTCH-246: > It seems like the Injector should be loading the current time from a job > configuration property in the same way that that the Gener

[jira] Commented: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement

2006-04-12 Thread Chris Schneider (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-246?page=comments#action_12374253 ] Chris Schneider commented on NUTCH-246: --- As it turns out, this problem was due to a time synchronization between the jobtracker and the tasktrackers. When the URLs were i

[jira] Commented: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement

2006-04-11 Thread Chris Schneider (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-246?page=comments#action_12374049 ] Chris Schneider commented on NUTCH-246: --- A few more details: Stefan and I were able to reproduce this problem using either an injection set of 4500 URLs or a larger set