Hi Scott

EMR instance come with Lucene jars which might conflict with the ones used
by Nutch. One (brutal) option is to simply remove the ones preinstalled on
the slave nodes but it should also be possible to configure Hadoop so that
it uses user jars prior to the system ones.

Julien

On 5 May 2015 at 19:34, Scott Lundgren <slundg...@qsfllc.com> wrote:

> When I run my crawl in hadoop I’m getting the below error. Googling
> suggests there’s a version conflict between Lucene jars. How do I fix it?
>
> attempt_201505041850_0035_r_000001_0: MahoutInterestClassifierPlugin
> startUp complete!
> 15/05/05 17:23:23 INFO mapred.JobClient: Task Id :
> attempt_201505041850_0035_r_000004_0, Status : FAILED
> Error: LUCENE_36
>
> Nutch 1.9 is set up on Amazon EMR 1.0.3 (Hadoop 1.x) by ssh’ing into the
> master and compiled the source. I defined the properties for elastic.host,
> elastic.port, elastic.index in nutch-site.xml, then ran ant to compile the
> jar.
>
> Elasticsearch 1.3.4 was installed onto the master by fetching the debian
> package from elastic.co<http://elastic.co> and installed via dpkg. The
> elasticsearch service was started, then I created an index matching the
> value defined in elastic.index.
>
> Scott Lundgren
> Software Engineer
> (704) 973-7388
> slundg...@qsfllc.com<mailto:slundg...@qsfllc.com>
>
> QuietStream Financial, LLC<http://www.quietstreamfinancial.com>
> 11121 Carmel Commons Boulevard | Suite 250
> Charlotte, North Carolina 28226
>
> Our Portfolio of Commercial Real Estate Solutions:
> •        <http://www.defeasewithease.com> Commercial Defeasance<
> http://www.defeasewithease.com/> (Defease With Ease®)
> •        Fairview Real Estate Solutions<http://www.fairviewres.com/>
> •        Great River Mortgage Capital<
> http://www.greatrivermortgagecapital.com/>
> •        Tax Credit Asset Management<http://www.tcamre.com/>
> •        Radian Generation<http://www.radiangeneration.com/>
> •        EntityKeeper<http://www.entitykeeper.com/>™
> •        Crowd With Ease<http://www.crowdwithease.com>™
> •        FullCapitalStack<http://www.fullcapitalstack.com>™
> •        CrowdRabbit<http://www.crowdrabbit.com>™
>
>


-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to