The problem was caused by mismatched JDK versions. I had to use 1.8 to compile Ant and Nutch, but CloudEra Manager defaults to JDK1.6 or 1.7. Solution is to override JAVA_HOME in hosts configurations and restart cluster...

On 08/15/2017 11:51 PM, Michael Chen wrote:
Hi,

I just figured out how to deploy jobs to hadoop with the jar file... But I ran into an error during the first injection step:

java.lang.UnsupportedClassVersionError: org/apache/gora/mapreduce/GoraOutputFormat: Unsupported major.minor version 52.0

I'm using 2.x which is configured for gora 0.7, and I specified HBase-common as 1.2.3 to be consistent with -client and -protocol libraries.

CloudEra 5.12(latest version) runs Hadoop 2.6.0, HBase 1.2.0, ZooKeeper 3.4.5, Solr 4.10.3.

Does anyone know what this error is caused by?

Thanks!

Michael


On 08/14/2017 07:29 PM, Divjot Singh wrote:
Hi Michael

I am using the latest Cloudera release and it's working fine. You can use any Linux distro you are comfortable with. Centos is mostly used for server
deployments and it's quite stable.

Thanks
Divjot


On 15-Aug-2017 2:09 AM, "Michael Chen" <yiningchen2...@u.northwestern.edu>
wrote:

Hi Divjot,

Thanks for the information! I was wondering if there is a specific version
of cloudera manager and CDH that works best with Nutch 2.x (HBase 1.2.3,
Hadoop 2.5.2)?

Also, is there a specific reason to use Centos 7 instead of Amazon Linux or
Red Hat?

I’ll try to get started with the setup. Thanks!

Michael

From: Divjot Singh
Sent: Tuesday, August 8, 2017 04:06
To: user@nutch.apache.org
Subject: Re: Best practice for Nutch 2.x on AWS?

Hi

We have a setup of Hbase on an AWS cluster with centos 7. The setup was
done using cloudera-manager. Nutch can be then run in standalone mode or
over yarn by running the deployment jar in deploy folder.

I have not tested with S3 directly but your can always backup the hbase
data daily to S3.

Hope this helps.Let me know if you have further queries.

Divjot


On Sun, Aug 6, 2017 at 5:59 AM, Michael Chen <
yiningchen2...@u.northwestern.edu> wrote:

Hi,

I'm trying to set up Nutch 2.x on AWS EC2 clusters, and I was wondering if
anyone know of a "best set up" for it. The hadoop and hbase version in
current EMR releases doesn't seem to work with Nutch 2.x. Does it sound
like a good idea to manually set up Hadoop clusters and then run Nutch on it? Will I be able to use S3 as data storage so that I can keep the data
when EC2 instance stops?

Any suggestions would be very much helpful!

Thanks in advance,

Michael




Reply via email to