Hi John, the recent master has seen an upgrade to the new MapReduce API (NUTCH-2375), it was a huge change which is already known to have introduced some issues. For production it's recommended to use 1.14 and if necessary patch it.
Could you open a new issue on https://issues.apache.org/jira/projects/NUTCH and provide the detailed stack there. Thanks, Sebastian On 03/16/2018 01:45 PM, John Thornton wrote: > Hello, > > I'm currently running Nutch under Amazon EMR 5.12.0 with Hadoop 2.83 using > S3 (EMRFS) as the filesystem. If I build the latest version from the > master branch and run a crawl in distributed mode I get a fetcher error > like fetcher.Fetcher: Fetcher: java.lang.IllegalArgumentException: Wrong > FS: s3:..., expected: hdfs://... > > This problem was reported in NUTCH-2494 and fixed in PR-274 and indeed when > I run the same crawl using a build of commit 87c7a2e it works with no > error. So my question is has a regression been introduced, or am I missing > something? > > Regards, > > John >