Hi Hiran, > Yet with this mail I'd like to point out the below error is useless. No > information given means no way to investigate or troubleshoot.
Could you also share the output in the hadoop.log located in the logs folder? Often the log messages there are more informative. Why are there multiple logs? Nutch is built on Hadoop MapReduce which is a distributed system. Also logs are distributed: only the job client logs to stdout, the tasks (which do the work) are logging to hadoop.log when run in local mode. In distributed mode the log location and aggregation is configured globally for all Hadoop jobs. > Even if I had a typo in my plugins.includes > regex, the intended plugins would just not be loaded - but some missing > plugins should not let Nutch crash like that. If the configuration is erroneous, it's often better to fail entirely. Otherwise the workflow runs and among the long log output, the error message may go unnoticed for a longer time. Best, Sebastian On 10/11/24 23:17, Hiran Chaudhuri wrote:
To a workable nutch installation I edited nutch-site.xml to add a few plugins. As a result the seeding phase no longer works. I get the below error. Now I can obviously invetigate which plugins I added, take it step by step etc. to analyze what happened. Yet with this mail I'd like to point out the below error is useless. No information given means no way to investigate or troubleshoot. Could not Nutch try to give some better explanation - even if it were to increase the loglevel for details? Even if I had a typo in my plugins.includes regex, the intended plugins would just not be loaded - but some missing plugins should not let Nutch crash like that. 2024-10-11 23:09:56,133 INFO org.apache.nutch.crawl.Injector [main] Injecting seed URL file file:/home/hiran/NetBeansProjects/nutch/urls/seed.txt 2024-10-11 23:09:56,475 INFO org.apache.nutch.urlfilter.regex.RegexURLFilter [LocalJobRunner Map Task Executor #0] Reading urlfilter-regex rules file: regex-urlfilter.txt 2024-10-11 23:09:56,475 INFO org.apache.nutch.urlfilter.api.RegexURLFilterBase [LocalJobRunner Map Task Executor #0] Read 5 regex rules (org.apache.nutch.urlfilter.regex.RegexURLFilter) 2024-10-11 23:09:57,389 ERROR org.apache.nutch.crawl.Injector [main] Injector job did not succeed, job id: job_local167986338_0001, job status: FAILED, reason: NA 2024-10-11 23:09:57,390 ERROR org.apache.nutch.crawl.Injector [main] Injector: java.lang.RuntimeException: Injector job did not succeed, job id: job_local167986338_0001, job status: FAILED, reason: NA at org.apache.nutch.crawl.Injector.inject(Injector.java:446) at org.apache.nutch.crawl.Injector.run(Injector.java:574) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) at org.apache.nutch.crawl.Injector.main(Injector.java:538)

