Re: parse.ParserFactory

Lewis John Mcgibbney Tue, 22 May 2012 14:28:05 -0700

Unless your using <= Nutch 1.2 you should not be using
msexcel|mspowerpoint|msword|oo|pdf| within your plugin.includes... all
of these document formats are (and have been for some time)
implemented as Apache Tika parsers.


hth



On Tue, May 22, 2012 at 9:20 PM, Tolga <[email protected]> wrote:
> Hi,
>
> I crawl / index PDF files just fine, but I get the following warning.
>
> parse.ParserFactory - ParserFactory: Plugin: parse-pdf mapped to contentType
> application/pdf via parse-plugins.xml, but not enabled via plugin.includes
> in nutch-default.xml.
>
> I've got the value
> protocol-http|urlfilter-regex|parse-(html|tika|js|msexcel|mspowerpoint|msword|oo|pdf|swf|zip)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)
> for plugin.includes property in nutch-default.xml. What am I missing?
>
> Regards,



-- 
Lewis

Re: parse.ParserFactory

Reply via email to