Hi again Lewis,

I finally managed to index my site with your help. But when it was nearing the end, I got this error:

Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
    at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

What should I do about it?

Regards,

On 5/9/12 3:45 PM, Lewis John Mcgibbney wrote:
add an agent name to the http.agent.name property in nutch-site.xml

If this is the only warning that you receive then it should solve it.

hth

Lewis

On Wed, May 9, 2012 at 12:35 PM, Tolga<[email protected]>  wrote:
I've read that and done accordingly, I still get that error.

On 5/9/12 2:31 PM, Lewis John Mcgibbney wrote:
good to hear.

please see the tutorial for all required configuration

http://wiki.apache.org/nutch/NutchTutorial

On Wed, May 9, 2012 at 11:51 AM, Tolga<[email protected]>    wrote:
Dear Lewis,

I've done as you said, and it's beginning to work. Except that it's
complaining about http.agent.name not having been fed. The tut I have
read
states I don't need to fill it, but apparently I do. What should this be?

On 5/9/12 1:25 PM, Lewis John Mcgibbney wrote:
Hi Tolga,

If you were to use Nutch in local mode, you could navigate to
nutch/runtime/local and set this environment variable to NUTCH_HOME.
If you are then to use either individual commands via bin/nutch or
alternatively the crawl command within the same script, you would not
need to worry about your class path.

Does this make sense?

On Wed, May 9, 2012 at 8:03 AM, Tolga<[email protected]>      wrote:
Sorry, there are actually .jar files under the directory, but I still
can't
figure out what path to export to CLASSPATH


-------- Original Message --------
Subject:        CLASSPATH
Date:   Wed, 09 May 2012 10:00:53 +0300
From:   Tolga<[email protected]>
To:     [email protected]



Hi,

This is my very first post to the list. In fact, I heard of nutch only
yesterday.

Anyway, I'm trying to figure out what path to export CLASSPATH to.
Tutorials tell me it needs to be where my .jar files are. However,
there
are no .jar files under apache-nutch directory. So, please help me
figure this out.

Regards,




Reply via email to