Hi Lewis,

The good news with this overriding, I don't get the Neko error parsing but
I saw in the hadoop.log this outputs :

2013-01-09 06:29:43,738 INFO  parse.ParserJob - Parsing
http://www.ab-advisory.com/
2013-01-09 06:29:43,745 WARN  parse.ParserFactory - ParserFactory: Plugin:
org.apache.nutch.parse.tika.TikaParser mapped to contentType
application/xhtml+xml via parse$
2013-01-09 06:29:43,745 WARN  parse.ParserFactory - ParserFactory: Plugin:
org.apache.nutch.parse.tika.TikaParser mapped to contentType * via
parse-plugins.xml, but no$
2013-01-09 06:29:43,745 WARN  parse.ParseUtil - *No suitable parser found:
parser not found for contentType=application/xhtml+xml url=
http://www.ab-advisory.com/*
2013-01-09 06:29:46,466 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
2013-01-09 06:29:47,435 INFO  parse.ParserJob - ParserJob: success

seems that tika cannot parse html file ? am I wrong ?

kr, Arcondo


On Wed, Jan 9, 2013 at 12:22 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> <mimeType name="text/html">
>                 <plugin id="parse-tika" />
> </mimeType>
> <mimeType name="application/xhtml+xml">
>                 <plugin id="parse-tika" />
> </mimeType>
>

Reply via email to