Hello,

You cannot just run Nutch's JAR like that on Hadoop, you need the large
.job file instead. If you build Nutch from source, you will get a
runtime/deploy directory. Upload its contents to a Hadoop client and run
Nutch commands using bin/nutch ... You will then automatically use the
large .job file that is on the same level as the bin directory.

Application log files on Hadoop are to be found everywhere. Select
individuel mapper or reduce subtasks, click deeper, and look to inspect
their logs. That is where the application logs are to be found.

Good luck!
Markus

Op vr 14 okt. 2022 om 16:18 schreef Mike <mz579...@gmail.com>:

> Hi!
>
> I've been using Nutch for a while but I'm new to hadoop. got a cluster with
> hadoop 3.2.3 installed.
>
> do i have to install nutch on the hadoop filesystem or can i run it
> "local"? the clients don't need more from nutch than the info on master in
> the command line: hadoop jar /home/debian/nutch40/lib/apache-nutch-1.19.jar
> org.apache.nutch.tools.FreeGenerator -conf /home/debian/
> nutch40/conf/nutch-default.xml
> -Dplugin.folder=/home/debian/nutch40/plugins/
> /crawl/urls//tranco-top350k-20221007.txt /home/debian/crawl/segments/
>
> I get an error on the command:
>
> Exception in thread "main" java.lang.RuntimeException: FreeGenerator job
> did not succeed, job id: job_1665751705815_0007, job status: FAILED,
> reason: Task failed task_1665751705815_0007_m_000000
>
>
> Since I'm new I can't find the logs in hadoop properly yet.
>
> Is there a guide how to install Natch (1.19) on Hadoop that I can't find?
>
> Thanks
> Mike
>

Reply via email to