Hi Alex, I'm using 2.1 version / hbase 0.90.6 / solr 4.0 everything works fine except I'm not able to parse the contents of my url because of the error Nekohtml not found.
my plugins include looks like this : <value>protocol-http|urlfilter-regex|parse-(xml|xhtml|html|tika|text|js)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic|lib-nekohtml</value> I added lib-nekohtml at the end of the allowed values but seems that has no effect on the error. in my runtime/local/plugins/lib-nekohtml, I have the jar file : nekohtml-0.9.5.jar is there something I should look for beside this ? Thanks a lot for your help. Kr, Arcondo On Fri, Jan 4, 2013 at 11:33 PM, <[email protected]> wrote: > Which version of nutch is this? Did you follow the tutorial? I can help > yuu if you provide all steps you did, starting with downloading nutch. > > Alex. > > > > > > > > -----Original Message----- > From: Arcondo Dasilva <[email protected]> > To: user <[email protected]> > Sent: Fri, Jan 4, 2013 1:23 pm > Subject: Re: Native Hadoop library not loaded and Cannot parse sites > contents > > > Hi Alex, > > I tried. That was the first thing I did but without success. > I don't understand why I'm obliged to use Neko instead of Tika. As far as I > know tika can parse more than 1200 different formats > > Kr, Arcondo > > > On Fri, Jan 4, 2013 at 7:47 PM, <[email protected]> wrote: > > > move or copy that jar file to local/lib and try again. > > > > hth. > > Alex. > > > > > > > > > > > > > > > > -----Original Message----- > > From: Arcondo <[email protected]> > > To: user <[email protected]> > > Sent: Fri, Jan 4, 2013 2:55 am > > Subject: Re: Native Hadoop library not loaded and Cannot parse sites > > contents > > > > > > Hope that now you can see them > > > > Plugin folder > > <http://lucene.472066.n3.nabble.com/file/n4030524/plugin_folder.png> > > > > Parse Job > > > > <http://lucene.472066.n3.nabble.com/file/n4030524/parse_job.png> > > > > Parse error : Hadoop.log > > > > <http://lucene.472066.n3.nabble.com/file/n4030524/parse_error.png> > > > > My nutch-site.xm (plugin includes) > > > > <property> > > <name>plugin.includes</name> > > > > > <value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic</value> > > <description>Regular expression naming plugin directory names to > > include. Any plugin not matching this expression is excluded. > > In any case you need at least include the nutch-extensionpoints plugin. > > By default Nutch includes crawling just HTML and plain text via HTTP, > > and basic indexing and search plugins. In order to use HTTPS please > > enable > > protocol-httpclient, but be aware of possible intermittent problems > > with the > > underlying commons-httpclient library. > > </description> > > </property> > > > > > > > > > > > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Native-Hadoop-library-not-loaded-and-Cannot-parse-sites-contents-tp4029542p4030524.html > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > > > > >

