Re: Native Hadoop library not loaded and Cannot parse sites contents

Arcondo Dasilva Sat, 05 Jan 2013 01:11:22 -0800

Hi Alex,

I'm using 2.1 version / hbase 0.90.6 / solr 4.0
everything works fine except I'm not able to parse the contents of my url
because of the error Nekohtml not found.


my plugins include looks like this :

<value>protocol-http|urlfilter-regex|parse-(xml|xhtml|html|tika|text|js)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic|lib-nekohtml</value>

I added  lib-nekohtml at the end of the allowed values but seems that has
no effect on the error.

in my runtime/local/plugins/lib-nekohtml, I have the jar file
: nekohtml-0.9.5.jar

is there something I should look for beside this ?

Thanks a lot for your help.

Kr, Arcondo


On Fri, Jan 4, 2013 at 11:33 PM, <[email protected]> wrote:

> Which version of nutch  is this? Did you follow the tutorial? I can help
> yuu if you provide all steps you did, starting with downloading nutch.
>
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Arcondo Dasilva <[email protected]>
> To: user <[email protected]>
> Sent: Fri, Jan 4, 2013 1:23 pm
> Subject: Re: Native Hadoop library not loaded and Cannot parse sites
> contents
>
>
> Hi Alex,
>
> I tried. That was the first thing I did but without success.
> I don't understand why I'm obliged to use Neko instead of Tika. As far as I
> know tika can parse more than 1200 different formats
>
> Kr, Arcondo
>
>
> On Fri, Jan 4, 2013 at 7:47 PM, <[email protected]> wrote:
>
> > move or copy that jar file to local/lib and try again.
> >
> > hth.
> > Alex.
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Arcondo <[email protected]>
> > To: user <[email protected]>
> > Sent: Fri, Jan 4, 2013 2:55 am
> > Subject: Re: Native Hadoop library not loaded and Cannot parse sites
> > contents
> >
> >
> > Hope that now you can see them
> >
> > Plugin folder
> > <http://lucene.472066.n3.nabble.com/file/n4030524/plugin_folder.png>
> >
> > Parse Job
> >
> > <http://lucene.472066.n3.nabble.com/file/n4030524/parse_job.png>
> >
> > Parse error : Hadoop.log
> >
> > <http://lucene.472066.n3.nabble.com/file/n4030524/parse_error.png>
> >
> > My nutch-site.xm (plugin includes)
> >
> > <property>
> > <name>plugin.includes</name>
> >
> >
> <value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic</value>
> >  <description>Regular expression naming plugin directory names to
> >   include.  Any plugin not matching this expression is excluded.
> >   In any case you need at least include the nutch-extensionpoints plugin.
> >  By default Nutch includes crawling just HTML and plain text via HTTP,
> >    and basic indexing and search plugins. In order to use HTTPS please
> >  enable
> >    protocol-httpclient, but be aware of possible intermittent problems
> >  with the
> >   underlying commons-httpclient library.
> >   </description>
> >  </property>
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Native-Hadoop-library-not-loaded-and-Cannot-parse-sites-contents-tp4029542p4030524.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
> >
> >
>
>
>

Re: Native Hadoop library not loaded and Cannot parse sites contents

Reply via email to