Hi Hiran,

> Yet with this mail I'd like to point out the below error is useless. No
> information given means no way to investigate or troubleshoot.

Could you also share the output in the hadoop.log located in the logs
folder? Often the log messages there are more informative.

Why are there multiple logs? Nutch is built on Hadoop MapReduce which
is a distributed system. Also logs are distributed: only the job client
logs to stdout, the tasks (which do the work) are logging to hadoop.log
when run in local mode. In distributed mode the log location and aggregation
is configured globally for all Hadoop jobs.


> Even if I had a typo in my plugins.includes
> regex, the intended plugins would just not be loaded - but some missing
> plugins should not let Nutch crash like that.

If the configuration is erroneous, it's often better to fail entirely.
Otherwise the workflow runs and among the long log output, the error
message may go unnoticed for a longer time.


Best,
Sebastian

On 10/11/24 23:17, Hiran Chaudhuri wrote:
To a workable nutch installation I edited nutch-site.xml to add a few
plugins.
As a result the seeding phase no longer works. I get the below error.

Now I can obviously invetigate which plugins I added, take it step by
step etc. to analyze what happened.
Yet with this mail I'd like to point out the below error is useless. No
information given means no way to investigate or troubleshoot. Could not
Nutch try to give some better explanation - even if it were to increase
the loglevel for details? Even if I had a typo in my plugins.includes
regex, the intended plugins would just not be loaded - but some missing
plugins should not let Nutch crash like that.


2024-10-11 23:09:56,133 INFO org.apache.nutch.crawl.Injector [main]
Injecting seed URL file
file:/home/hiran/NetBeansProjects/nutch/urls/seed.txt
2024-10-11 23:09:56,475 INFO
org.apache.nutch.urlfilter.regex.RegexURLFilter [LocalJobRunner Map Task
Executor #0] Reading urlfilter-regex rules file: regex-urlfilter.txt
2024-10-11 23:09:56,475 INFO
org.apache.nutch.urlfilter.api.RegexURLFilterBase [LocalJobRunner Map
Task Executor #0] Read 5 regex rules
(org.apache.nutch.urlfilter.regex.RegexURLFilter)
2024-10-11 23:09:57,389 ERROR org.apache.nutch.crawl.Injector [main]
Injector job did not succeed, job id: job_local167986338_0001, job
status: FAILED, reason: NA
2024-10-11 23:09:57,390 ERROR org.apache.nutch.crawl.Injector [main]
Injector: java.lang.RuntimeException: Injector job did not succeed, job
id: job_local167986338_0001, job status: FAILED, reason: NA
     at org.apache.nutch.crawl.Injector.inject(Injector.java:446)
     at org.apache.nutch.crawl.Injector.run(Injector.java:574)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
     at org.apache.nutch.crawl.Injector.main(Injector.java:538)



Reply via email to