Unless your using <= Nutch 1.2 you should not be using msexcel|mspowerpoint|msword|oo|pdf| within your plugin.includes... all of these document formats are (and have been for some time) implemented as Apache Tika parsers.
hth On Tue, May 22, 2012 at 9:20 PM, Tolga <[email protected]> wrote: > Hi, > > I crawl / index PDF files just fine, but I get the following warning. > > parse.ParserFactory - ParserFactory: Plugin: parse-pdf mapped to contentType > application/pdf via parse-plugins.xml, but not enabled via plugin.includes > in nutch-default.xml. > > I've got the value > protocol-http|urlfilter-regex|parse-(html|tika|js|msexcel|mspowerpoint|msword|oo|pdf|swf|zip)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic) > for plugin.includes property in nutch-default.xml. What am I missing? > > Regards, -- Lewis

