Hi John,

I haven't tried to run Nutch on Cygwin for over a decade so I'm pretty rusty on 
this response. I am 100% sure that the project doesn't currently have any 
continuous integration or builds running on Windows however I suppose this is 
something we could look into!

I found a few articles on the Nutch wiki
* 
https://cwiki.apache.org/confluence/display/nutch/GettingNutchRunningWithWindows
* https://cwiki.apache.org/confluence/display/NUTCH/GettingNutchRunningOnCygwin

... as you can see, they were moved to an "Archive and Legacy" section of the 
wiki.

That being said nothing 

On 2024/11/22 00:40:36 John Whelan wrote:

> 2024-11-19 20:21:21,948 ERROR o.a.n.c.Injector [main] Injector:
> java.lang.RuntimeException: java.io.FileNotFoundException:
> java.io.FileNotFoundException:
>  HADOOP_HOME and hadoop.home.dir are unset. -see
> https://wiki.apache.org/hadoop/WindowsProblems
>         at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:788)
>         at 
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:297)
>         at 
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:313)
> ...

If you take a look at 
https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems you will 
see that you can likely add the hadoop.home.dir property to nutch-site.xml with 
thae value set to the base directory of your Nutch crawler.

> 
> I suspect this might be a common issue, but I couldn’t locate any
> information addressing it for recent versions of Nutch. Is this a tested
> use case, or could this potentially be a regression?

It is NOT a tested use case. It MAY be a regression. 

> 
> Additionally, are there verified steps for setting up and running Nutch on
> Cygwin? If not, would you recommend an alternative approach for Windows,
> such as WSL2, containers, or another solution?
> 

We do publish a Nutch Docker Container image which you may wish to investigate.
https://hub.docker.com/r/apache/nutch

Hope this helps
lewismc

Reply via email to