Sorry for bringing this question up again.
I modified the plugin.xml under nutch_home/src/plugin/parse-js directory
to be:
<extension id="com.p2r.nutch.filtering.P2RHtmlFilter"
name="P2R Html Filter"
point="org.apache.nutch.parse.HtmlParseFilter">
<implementation id="JSParseFilter"
class="com.p2r.nutch.filtering.P2RHtmlFilter">
</implementation>
</extension>
Please note that I only kept the implementation id as JSParseFilter
while modifying all the other parameters to point to my implementation
class of the HtmlParseFilter. Then P2RHtmlFilter code was successfully
called.
If I changed the implementation id to "P2RHtmlFilter", which is defined
in the following plugin.xml:
<plugin
id="p2r-plugins"
name="P2R Plugins for Nutch"
version="0.0.1"
provider-name="p2r.com">
<runtime>
<library name="p2r-plugins.jar">
<export name="*"/>
</library>
</runtime>
<requires>
<import plugin="nutch-extensionpoints"/>
</requires>
<extension id="com.p2r.nutch.filtering.P2RHtmlFilter"
name="P2R Html Filter"
point="org.apache.nutch.parse.HtmlParseFilter">
<implementation id="P2RHtmlFilter"
class="com.p2r.nutch.filtering.P2RHtmlFilter">
</implementation>
</extension>
</plugin>
My nutch-site.xml already contains the plugin.includes property which
looks like this:
<property>
<name>plugin.includes</name>
<value>nutch-extensionpoints|p2r-plugins|parse-html|protocol-http|
urlfilter-regex|parse-(pdf|js)|index-(basic|anchor)|query-(basic|site|
url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|
regex|basic)</value>
</property>
Does anyone know where I have missed?
Thanks a lot!
On Tue, 2010-07-13 at 09:43 +0100, Julien Nioche wrote:
> Hi Jeff,
>
> 1) I don't see the implementation id of "JSParseFilter" is used in the
> > parse-plugins.xml file under the $NUTCH_HOME\conf folder. Then how does
> > Nutch knows that this filter function should be called?
> >
>
> parse-plugins.xml lists Parsers whereas JSParseFilter is a HTMLParseFilter.
> HTMLParseFilters get a DOM representation of the documents from the HTML or
> TikaParser.
> Nutch will automatically load the JSParseFilter along with other
> HTMLParseFilters provided that you list the corresponding plugin in
> plugin.includes
>
>
> > 2) I want to replace this filter with my own filter, and I wrote the
> > follow code:
> >
> > <extension id="com.mycompany.nutch.parse.MyParseFilter"
> > name="Parse JS Filter"
> > point="org.apache.nutch.parse.HtmlParseFilter">
> > <implementation id="MyParseFilter"
> > class="com.mycompany.nutch.parse.MyParseFilter">
> > </implementation>
> > </extension>
> >
> > and put it into the plugin.xml file under $NUTCH_HOME\src\myplugin
> > directory. But my filter is never called. Any ideas?
> >
>
> Have a look at the wiki page e.g
> http://wiki.apache.org/nutch/WritingPluginExample-0.9
>
>
>
>