Hi John,

the recent master has seen an upgrade to the new MapReduce API (NUTCH-2375),
it was a huge change which is already known to have introduced some issues.
For production it's recommended to use 1.14 and if necessary patch it.

Could you open a new issue on
      https://issues.apache.org/jira/projects/NUTCH
and provide the detailed stack there.

Thanks,
Sebastian

On 03/16/2018 01:45 PM, John Thornton wrote:
> Hello,
> 
> I'm currently running Nutch under Amazon EMR 5.12.0 with Hadoop 2.83 using
> S3 (EMRFS) as the filesystem.  If I build the latest version from the
> master branch and run a crawl in distributed mode I get a fetcher error
> like fetcher.Fetcher: Fetcher: java.lang.IllegalArgumentException: Wrong
> FS: s3:..., expected: hdfs://...
> 
> This problem was reported in NUTCH-2494 and fixed in PR-274 and indeed when
> I run the same crawl using a build of commit 87c7a2e it works with no
> error.  So my question is has a regression been introduced, or am I missing
> something?
> 
> Regards,
> 
> John
> 

Reply via email to