Hi Mike,

the Nutch configuration files are included in the job file found in runtime/deploy after build. This means you need to compile Nutch yourself
if used in "distributed" mode.

For exercising, you can first work in "pseudo-distributed" mode, i.e.
on a single-node Hadoop cluster. All commands are the same than in fully distributed mode.

If it helps, I prepared some setup scripts to run Nutch in pseudo-distributed 
mode:
  https://github.com/sebastian-nagel/nutch-test-single-node-cluster

Best,
Sebastian

On 1/15/23 04:26, Mike wrote:
I will now try to configure the bot url etc. before the building,
but how and where do I configure between the crawls e.g. number of pages
per host?

where do I configure nutch in cluster mode?

thx, mike

Reply via email to