Hi John, I haven't tried to run Nutch on Cygwin for over a decade so I'm pretty rusty on this response. I am 100% sure that the project doesn't currently have any continuous integration or builds running on Windows however I suppose this is something we could look into!
I found a few articles on the Nutch wiki * https://cwiki.apache.org/confluence/display/nutch/GettingNutchRunningWithWindows * https://cwiki.apache.org/confluence/display/NUTCH/GettingNutchRunningOnCygwin ... as you can see, they were moved to an "Archive and Legacy" section of the wiki. That being said nothing On 2024/11/22 00:40:36 John Whelan wrote: > 2024-11-19 20:21:21,948 ERROR o.a.n.c.Injector [main] Injector: > java.lang.RuntimeException: java.io.FileNotFoundException: > java.io.FileNotFoundException: > HADOOP_HOME and hadoop.home.dir are unset. -see > https://wiki.apache.org/hadoop/WindowsProblems > at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:788) > at > org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:297) > at > org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:313) > ... If you take a look at https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems you will see that you can likely add the hadoop.home.dir property to nutch-site.xml with thae value set to the base directory of your Nutch crawler. > > I suspect this might be a common issue, but I couldn’t locate any > information addressing it for recent versions of Nutch. Is this a tested > use case, or could this potentially be a regression? It is NOT a tested use case. It MAY be a regression. > > Additionally, are there verified steps for setting up and running Nutch on > Cygwin? If not, would you recommend an alternative approach for Windows, > such as WSL2, containers, or another solution? > We do publish a Nutch Docker Container image which you may wish to investigate. https://hub.docker.com/r/apache/nutch Hope this helps lewismc

