Hello, I recently attempted to install and run Nutch in a Cygwin environment, following the Nutch tutorial (link to tutorial <https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial>). However, when executing the first crawl command (bin/nutch inject crawl/crawldb URLs), I encountered the following error:
... 2024-11-19 20:21:21,246 INFO o.a.n.c.Injector [main] Injector: crawlDb: crawl/crawldb 2024-11-19 20:21:21,246 INFO o.a.n.c.Injector [main] Injector: urlDir: urls 2024-11-19 20:21:21,246 INFO o.a.n.c.Injector [main] Injector: Converting injected urls to crawl db entries. 2024-11-19 20:21:21,948 ERROR o.a.n.c.Injector [main] Injector: java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:788) at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:297) at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:313) ... I suspect this might be a common issue, but I couldn’t locate any information addressing it for recent versions of Nutch. Is this a tested use case, or could this potentially be a regression? Additionally, are there verified steps for setting up and running Nutch on Cygwin? If not, would you recommend an alternative approach for Windows, such as WSL2, containers, or another solution? Thank you for your assistance! Best regards, John

