Hi Mike,

> It can be tedious to set up for the first time, and there are many components.

In case you prefer Linux packages, I can recommend Apache Bigtop, see
   https://bigtop.apache.org/
and for the list of package repositories
   https://downloads.apache.org/bigtop/stable/repos/

~Sebastian

On 1/15/23 01:06, Markus Jelsma wrote:
Hello Mike,

would it pay off for me to put a hadoop cluster on top of the 3 servers.

Yes, for as many reasons as Hadoop exists for. It can be tedious to set up
for the first time, and there are many components. But at least you have
three servers, which is kind of required by Zookeeper, that you will also
need.

Ideally you would have some additional VMs to run the controlling Hadoop
programs and perhaps the Hadoop client nodes on. The workers can run on
bare metal.

1.) a server would not be integrated directly into the crawl process as a
master.

What do you mean? Can you elaborate?

2.) can I run multiple crawl jobs on one server?

Yes! Just have separate instances of Nutch home dirs on your Hadoop client
nodes, each having their own configuration.

Regards,
Markus

Op za 14 jan. 2023 om 18:42 schreef Mike <mz579...@gmail.com>:

Hi!

I am now crawling the internet in local mode in parallel with up to 10
instances on 3 computers. would it pay off for me to put a hadoop cluster
on top of the 3 servers.

1.) a server would not be integrated directly into the crawl process as a
master.
2.) can I run multiple crawl jobs on one server?

Thanks


Reply via email to