Hi Michael,

Nutch (1.18, and trunk/master) should work together with more recent Hadoop
versions.

At Common Crawl we use a modified Nutch version based on the recent trunk
running on Hadoop 3.2.2 (soon 3.2.3) and Java 11, even on a mixed Hadoop cluster
with x64 and arm64 AWS EC2 instances.

But I'm sure there are more possible combinations.

One important note: in trunk/master there is a yet unsolved regression caused by
the newly introduced plugin-based URL stream handlers, see NUTCH-2936 and
NUTCH-2949. Unless these are resolved, you need to undo these commits in order
to run Nutch (built from trunk/master) in distributed mode.

Best,
Sebastian

On 6/13/22 01:37, Michael Coffey wrote:
> Do current 1.x versions of Nutch (1.18, and trunk/master) work with versions 
> of Hadoop greater than 3.1.3? I ask because Hadoop 3.1.3 is from October 
> 2019, and there are many newer versions available. For example, 3.1.4 came 
> out in 2020, and there are 3.2.x and 3.3.x versions that came out this year.
> 
> I don’t care about newer features in Hadoop, I just have general concerns 
> about stability and security. I am working on reviving an old project and 
> would like to put together the best possible infrastructure for the future.
> 
> 

Reply via email to