Re: Strange RemoteException thrown while doing a parse of ~64m documents

2007-10-07 Thread Ned Rockson
This was a normal Nutch parse. I'm still not sure what was causing the bug, but it stopped last week. On 10/7/07, Dennis Kubes <[EMAIL PROTECTED]> wrote: > This happens when two reduce tasks try to write to the same output > folder, usually on the dfs. Was this a Nutch Parse job or a custom Map >

Re: Strange RemoteException thrown while doing a parse of ~64m documents

2007-10-07 Thread Dennis Kubes
This happens when two reduce tasks try to write to the same output folder, usually on the dfs. Was this a Nutch Parse job or a custom Map Reduce job? Dennis Kubes Ned Rockson wrote: This is the second time I've run this large parse of ~64m documents. In the reduce phase, both times through t

Re: Java Packages (missing)

2007-10-07 Thread Dennis Kubes
These are classes from plugins and therefore are in their specific plugin src directory. For example regex url normalized is found at: NutchTrunk\src\plugin\urlnormalizer-regex\src\java\org\apache\nutch\net\urlnormalizer\regex\RegexURLNormalizer.java Dennis Kubes Sagar Vibhute wrote: Hello,

[jira] Issue Comment Edited: (NUTCH-562) Port mime type framework to use Tika mime detection framework

2007-10-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532972 ] chrismattmann edited comment on NUTCH-562 at 10/7/07 8:34 AM: -- Initial patch for comm

[jira] Updated: (NUTCH-562) Port mime type framework to use Tika mime detection framework

2007-10-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-562: Attachment: tika-0.1-dev.jar Tika 0.1 unrelased jar file -- drop this in $NUTCH_SRC_HOME/lib

[jira] Updated: (NUTCH-562) Port mime type framework to use Tika mime detection framework

2007-10-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-562: Attachment: NUTCH-562.Mattmann.patch.txt Initial patch for comments: 1. This patch removes

Java Packages (missing)

2007-10-07 Thread Sagar Vibhute
Hello, Does the default provided nutch0.9 package comes with certain java packages missing? I could compile the source (I downloaded the tarball, not from svn) using ant. But when I start crawling it throws ClassNotFoundException, like: java.lang.ClassNotFoundException: org.apache.nutch.net.urlno

Re: First Plugin

2007-10-07 Thread Sagar Vibhute
I started a crawl after adding a plugin given on the wiki ( http://wiki.apache.org/nutch/WritingPluginExample-0%2e9) When I crawled, it stopped after throwing an exception. Here is what the hadoop.log file says: -