Hi Michael, Nutch (1.18, and trunk/master) should work together with more recent Hadoop versions.
At Common Crawl we use a modified Nutch version based on the recent trunk running on Hadoop 3.2.2 (soon 3.2.3) and Java 11, even on a mixed Hadoop cluster with x64 and arm64 AWS EC2 instances. But I'm sure there are more possible combinations. One important note: in trunk/master there is a yet unsolved regression caused by the newly introduced plugin-based URL stream handlers, see NUTCH-2936 and NUTCH-2949. Unless these are resolved, you need to undo these commits in order to run Nutch (built from trunk/master) in distributed mode. Best, Sebastian On 6/13/22 01:37, Michael Coffey wrote: > Do current 1.x versions of Nutch (1.18, and trunk/master) work with versions > of Hadoop greater than 3.1.3? I ask because Hadoop 3.1.3 is from October > 2019, and there are many newer versions available. For example, 3.1.4 came > out in 2020, and there are 3.2.x and 3.3.x versions that came out this year. > > I don’t care about newer features in Hadoop, I just have general concerns > about stability and security. I am working on reviving an old project and > would like to put together the best possible infrastructure for the future. > >

