Hi Mike,
the Nutch configuration files are included in the job file found in
runtime/deploy after build. This means you need to compile Nutch yourself
if used in "distributed" mode.
For exercising, you can first work in "pseudo-distributed" mode, i.e.
on a single-node Hadoop cluster. All commands are the same than in fully
distributed mode.
If it helps, I prepared some setup scripts to run Nutch in pseudo-distributed
mode:
https://github.com/sebastian-nagel/nutch-test-single-node-cluster
Best,
Sebastian
On 1/15/23 04:26, Mike wrote:
I will now try to configure the bot url etc. before the building,
but how and where do I configure between the crawls e.g. number of pages
per host?
where do I configure nutch in cluster mode?
thx, mike