Hi Kaz,

At least you have to change ivy.xml to make Nutch 2 compatible with Hadoop 2
Please be so kind, publish or post me your final build scripts (build.xml,
default.properties, ivy.xml, perhaps pom.xml etc.)
I'll try to repeat your success

BR,
Alex Median

On Sat, Oct 4, 2014 at 4:03 PM, k4200 <[email protected]> wrote:

>  Hi Alex,
>
> > But info about another experiences with Nutch2+hadoop2 will also good..
>
> I set up Nutch 2.3 + CDH 4.7 (HBase 0.94, Hadoop 2.0 etc) a few months
> ago, and it's working fine.
>
> I used the latest code from svn with no modifications, and followed
> the tutorial below:
> http://wiki.apache.org/nutch/Nutch2Tutorial
>
> HTH,
> Kaz
>
> 2014-10-03 22:03 GMT+09:00 Alex Median <[email protected]>:
> >
> > Hi,
> >
> > Within a month I'm in the process of installing Nutch 2.3 in this
> > configuration (subj).
> > Nutch 2 initially with Hadoop 1 was chosen a few months ago, some of the
> > coding is already done.
> > We chose Amazon AWS Elastic MapReduce (EMR) as a platform.
> > Unfortunately EMR Hadoop 1 version on an old Debian does not suit us.
> > Therefore, we need to establish exactly Nutch 2 in the above
> configuration:
> > Hadoop 2.4.0 + HBase 0.94.18 (Amazon Linux: AMI version:3.2.1, Hadoop
> > distribution:Amazon 2.4.0, Applications:HBase 0.94.18)
> >
> > But info about another experiences with Nutch2+hadoop2 will also good..
> >
> > What has been done for the last iteration of the installation on local
> > computer:
> >
> > 1. Nutch 2.x
> > 1.1 svn current 2.x version
> > 1.2. prepared scripts:
> > 1.2.1 ivy:
> > <dependency org="org.apache.hadoop" name="hadoop-common" rev="2.4.0">..
> > <dependency org="org.apache.hadoop" name="hadoop-mapreduce-client-core"
> > rev="2.4.0">..
> > <dependency org="org.apache.gora" name="gora" rev="0.5"
> conf="*->default" />
> > <dependency org="org.apache.gora" name="gora-hbase" rev="0.5"
> > conf="*->default" />
> > etc.
> > 1.2.2 default.properties:
> > hadoop.version=2.4.0
> > version=2.3-SNAPSHOT
> > etc.
> > 1.3. added public int getFieldsCount() { return Field.values().length; }
> to
> > ProtocolStatus.java, ParseStatus.java, Host.java, WebPage.java.
> >
> > 2. HBase
> > 2.1 svn HBase 0.94.18
> > 2.2 prepared for Protobuf 2.5.0 [1], also thanks to Dobromyslov [5]
> > 2.3 also generated hbase-0.94.18-hadoop-2.4.0.jar
> >
> > 3. Gora 0.5 (also was tested for versions 0.4, 0.6-SNAPSHOT, and 0.5.3
> from
> > com.argonio.gora)
> >
> > 4. Avro 1.7.6 (also played with versions 1.7.4, 1.7.7)
> > 4.1 svn
> > 4.2 patched for AVRO-813[2]
> > 4.3 patched for AVRO-882[3] and rollbacked
> > 4.4 patched as mentioned in [4] - commented throwing EOFException against
> > org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473),
> etc.
> >
> > After investigating numerous exceptions in many weeks, a number of
> changes
> > have been made in the code Nutch 2.x and Avro 1.7.6 to suppress
> > exceptions and walk a little further. We got some success, Nutch looks
> like
> > a bit of running, but is unstable and incorrect. All necessary (for us)
> > stages pass in cycle (inject, generate, fetch, parse, updatedb). But some
> > functionalities are broken and ignored.
> > It seems that because of the poor Nutch/Hadoop/HBase experience, we broke
> > the normal data exchange between Nutch and HBase (also with gora and
> avro).
> > Perhaps some of the fields (and/or some of the data formats) read and
> write
> > incorrectly. For example, many markers are lost and temporary emulated in
> > code to pass through the steps; data in batchId field are lost; scoring
> is
> > broken also.
> >
> > Please help us! Perhaps there are somewhere the necessary working
> > assemblies and/or scripts and patches. Maybe someone has a positive
> > experience in this. I'm ready to publish all my diffs and exception
> traces.
> > Also, I would be very grateful if someone would tell me when we can get a
> > new of Nutch 2.3 release; it seems that it will be Hadoop2-compatible.
> >
> > [1] http://hbase.apache.org/book/configuration.html
> > [2] https://issues.apache.org/jira/browse/AVRO-813
> > [3] https://issues.apache.org/jira/browse/AVRO-882
> >
> http://mail-archives.apache.org/mod_mbox/avro-user/201108.mbox/%3ccaanh3_9_cqqbmt4vqyzg8-ikfo4nnlpcuzbbwd4kqoavpek...@mail.gmail.com%3E
> > [4]
> >
> http://mail-archives.apache.org/mod_mbox/nutch-user/201409.mbox/%3cCAEmTxX9HrRM00SxerFAdRdZy=wVAd9xCchDTuLaxPQ=wi0q...@mail.gmail.com%3e
> > [5]
> >
> http://stackoverflow.com/questions/13946725/configuring-hbase-standalone-mode-with-apache-nutch-java-lang-illegalargumente
> > https://github.com/dobromyslov
> >
> > BR,
> > Alex Median
>

Reply via email to