Hi Alex,

Just a quick question, why are you suing Gora from com.argonio.gora? Why
don't you use the Apache one? org.apache.gora? And could you tell us what
is exactly going wrong?


Renato M.

2014-10-04 14:03 GMT+02:00 k4200 <[email protected]>:

>  Hi Alex,
>
> > But info about another experiences with Nutch2+hadoop2 will also good..
>
> I set up Nutch 2.3 + CDH 4.7 (HBase 0.94, Hadoop 2.0 etc) a few months
> ago, and it's working fine.
>
> I used the latest code from svn with no modifications, and followed
> the tutorial below:
> http://wiki.apache.org/nutch/Nutch2Tutorial
>
> HTH,
> Kaz
>
> 2014-10-03 22:03 GMT+09:00 Alex Median <[email protected]>:
> >
> > Hi,
> >
> > Within a month I'm in the process of installing Nutch 2.3 in this
> > configuration (subj).
> > Nutch 2 initially with Hadoop 1 was chosen a few months ago, some of the
> > coding is already done.
> > We chose Amazon AWS Elastic MapReduce (EMR) as a platform.
> > Unfortunately EMR Hadoop 1 version on an old Debian does not suit us.
> > Therefore, we need to establish exactly Nutch 2 in the above
> configuration:
> > Hadoop 2.4.0 + HBase 0.94.18 (Amazon Linux: AMI version:3.2.1, Hadoop
> > distribution:Amazon 2.4.0, Applications:HBase 0.94.18)
> >
> > But info about another experiences with Nutch2+hadoop2 will also good..
> >
> > What has been done for the last iteration of the installation on local
> > computer:
> >
> > 1. Nutch 2.x
> > 1.1 svn current 2.x version
> > 1.2. prepared scripts:
> > 1.2.1 ivy:
> > <dependency org="org.apache.hadoop" name="hadoop-common" rev="2.4.0">..
> > <dependency org="org.apache.hadoop" name="hadoop-mapreduce-client-core"
> > rev="2.4.0">..
> > <dependency org="org.apache.gora" name="gora" rev="0.5"
> conf="*->default" />
> > <dependency org="org.apache.gora" name="gora-hbase" rev="0.5"
> > conf="*->default" />
> > etc.
> > 1.2.2 default.properties:
> > hadoop.version=2.4.0
> > version=2.3-SNAPSHOT
> > etc.
> > 1.3. added public int getFieldsCount() { return Field.values().length; }
> to
> > ProtocolStatus.java, ParseStatus.java, Host.java, WebPage.java.
> >
> > 2. HBase
> > 2.1 svn HBase 0.94.18
> > 2.2 prepared for Protobuf 2.5.0 [1], also thanks to Dobromyslov [5]
> > 2.3 also generated hbase-0.94.18-hadoop-2.4.0.jar
> >
> > 3. Gora 0.5 (also was tested for versions 0.4, 0.6-SNAPSHOT, and 0.5.3
> from
> > com.argonio.gora)
> >
> > 4. Avro 1.7.6 (also played with versions 1.7.4, 1.7.7)
> > 4.1 svn
> > 4.2 patched for AVRO-813[2]
> > 4.3 patched for AVRO-882[3] and rollbacked
> > 4.4 patched as mentioned in [4] - commented throwing EOFException against
> > org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473),
> etc.
> >
> > After investigating numerous exceptions in many weeks, a number of
> changes
> > have been made in the code Nutch 2.x and Avro 1.7.6 to suppress
> > exceptions and walk a little further. We got some success, Nutch looks
> like
> > a bit of running, but is unstable and incorrect. All necessary (for us)
> > stages pass in cycle (inject, generate, fetch, parse, updatedb). But some
> > functionalities are broken and ignored.
> > It seems that because of the poor Nutch/Hadoop/HBase experience, we broke
> > the normal data exchange between Nutch and HBase (also with gora and
> avro).
> > Perhaps some of the fields (and/or some of the data formats) read and
> write
> > incorrectly. For example, many markers are lost and temporary emulated in
> > code to pass through the steps; data in batchId field are lost; scoring
> is
> > broken also.
> >
> > Please help us! Perhaps there are somewhere the necessary working
> > assemblies and/or scripts and patches. Maybe someone has a positive
> > experience in this. I'm ready to publish all my diffs and exception
> traces.
> > Also, I would be very grateful if someone would tell me when we can get a
> > new of Nutch 2.3 release; it seems that it will be Hadoop2-compatible.
> >
> > [1] http://hbase.apache.org/book/configuration.html
> > [2] https://issues.apache.org/jira/browse/AVRO-813
> > [3] https://issues.apache.org/jira/browse/AVRO-882
> >
> http://mail-archives.apache.org/mod_mbox/avro-user/201108.mbox/%3ccaanh3_9_cqqbmt4vqyzg8-ikfo4nnlpcuzbbwd4kqoavpek...@mail.gmail.com%3E
> > [4]
> >
> http://mail-archives.apache.org/mod_mbox/nutch-user/201409.mbox/%3cCAEmTxX9HrRM00SxerFAdRdZy=wVAd9xCchDTuLaxPQ=wi0q...@mail.gmail.com%3e
> > [5]
> >
> http://stackoverflow.com/questions/13946725/configuring-hbase-standalone-mode-with-apache-nutch-java-lang-illegalargumente
> > https://github.com/dobromyslov
> >
> > BR,
> > Alex Median
>

Reply via email to