[jira] Commented: (NUTCH-25) needs 'character encoding' detector

2007-07-25 Thread Doug Cook (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515461 ] Doug Cook commented on NUTCH-25: > Can you provide a link on icu4j's language detection? http://www.icu-project.org/a

Re: [jira] Commented: (NUTCH-527) MapWritable doesn't support all hadoops writable types

2007-07-25 Thread Doğacan Güney
On 7/25/07, Robert Young <[EMAIL PROTECTED]> wrote: The message which was appearing in the logs is pasted below. Basically, in org.apache.nutch.crawl.MapWritable#getKeyValueEntry the Writable is instantiated. It's class is determined by a two byte code (which is written to crawldb I guess), if t

[jira] Commented: (NUTCH-25) needs 'character encoding' detector

2007-07-25 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515365 ] Doğacan Güney commented on NUTCH-25: [snip snip] > Internal to guessEncoding, we could certainly add the clue valu

Re: [jira] Commented: (NUTCH-527) MapWritable doesn't support all hadoops writable types

2007-07-25 Thread Robert Young
The message which was appearing in the logs is pasted below. Basically, in org.apache.nutch.crawl.MapWritable#getKeyValueEntry the Writable is instantiated. It's class is determined by a two byte code (which is written to crawldb I guess), if there is no entry for the class it fails to create it,

[jira] Commented: (NUTCH-25) needs 'character encoding' detector

2007-07-25 Thread Doug Cook (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515342 ] Doug Cook commented on NUTCH-25: Doğacan, Thanks for the quick feedback. > * EncodingDetector api is way too open. IM

[jira] Commented: (NUTCH-527) MapWritable doesn't support all hadoops writable types

2007-07-25 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515283 ] Doğacan Güney commented on NUTCH-527: - What was the error you were having? MapWritable supports reading and writin

CrawlDbReader TopN

2007-07-25 Thread Emmanuel
I've been through the code of the CrawlDbReader class. I discovered the method "processTopNJob" which use the class CrawlDbTopNMapper and CrawlDbTopNReducer. I'm wondering why do we have this function. Is it an old implementation that was used before the Generator to get the TopN links to Fetch or

[jira] Updated: (NUTCH-527) MapWritable doesn't support all hadoops writable types

2007-07-25 Thread Rob Young (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Young updated NUTCH-527: Attachment: mapwritable.patch I am not sure what the second parameter is so this may not be right. However,

[jira] Commented: (NUTCH-524) Generate Problem with Single Node

2007-07-25 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515259 ] Doğacan Güney commented on NUTCH-524: - Have you tried playing with max.threads.per.host option instead? If you set

[jira] Updated: (NUTCH-527) MapWritable doesn't support all hadoops writable types

2007-07-25 Thread Rob Young (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Young updated NUTCH-527: Description: The map of classes which implement org.apache.hadoop.io.Writable is not complete. It does not,

[jira] Created: (NUTCH-527) MapWritable doesn't support all hadoops writable types

2007-07-25 Thread Rob Young (JIRA)
MapWritable doesn't support all hadoops writable types -- Key: NUTCH-527 URL: https://issues.apache.org/jira/browse/NUTCH-527 Project: Nutch Issue Type: Bug Affects Versions: 0.9.0

[jira] Commented: (NUTCH-25) needs 'character encoding' detector

2007-07-25 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515230 ] Doğacan Güney commented on NUTCH-25: Overall I think the idea behind EncodingDetector is very solid. I will take a