Hi,
Within JSParseFilter, there is an implementation of the filter function,
which will be called when Html pages are to be filtered. I found the
following code in the plugin.xml file under $NUTCH_HOME\src\plugin
\parse-js directory:
<extension id="org.apache.nutch.parse.js.JSParseFilter"
name="Parse JS Filter"
point="org.apache.nutch.parse.HtmlParseFilter">
<implementation id="JSParseFilter"
class="org.apache.nutch.parse.js.JSParseFilter">
<parameter name="contentType" value="application/x-javascript"/>
<parameter name="pathSuffix" value=""/>
</implementation>
</extension>
I assume this code determines that the filter function within
JSParseFilter will be called.
My questions are:
1) I don't see the implementation id of "JSParseFilter" is used in the
parse-plugins.xml file under the $NUTCH_HOME\conf folder. Then how does
Nutch knows that this filter function should be called?
2) I want to replace this filter with my own filter, and I wrote the
follow code:
<extension id="com.mycompany.nutch.parse.MyParseFilter"
name="Parse JS Filter"
point="org.apache.nutch.parse.HtmlParseFilter">
<implementation id="MyParseFilter"
class="com.mycompany.nutch.parse.MyParseFilter">
</implementation>
</extension>
and put it into the plugin.xml file under $NUTCH_HOME\src\myplugin
directory. But my filter is never called. Any ideas?
Thanks a lot.
Jeff