Hi Jeff,

1) I don't see the implementation id of "JSParseFilter" is used in the
> parse-plugins.xml file under the $NUTCH_HOME\conf folder. Then how does
> Nutch knows that this filter function should be called?
>

parse-plugins.xml lists Parsers whereas JSParseFilter is a HTMLParseFilter.
HTMLParseFilters get a DOM representation of the documents from the HTML or
TikaParser.
Nutch will automatically load the JSParseFilter along with other
HTMLParseFilters provided that you list the corresponding plugin in
plugin.includes


> 2) I want to replace this filter with my own filter, and I wrote the
> follow code:
>
> <extension id="com.mycompany.nutch.parse.MyParseFilter"
>              name="Parse JS Filter"
>              point="org.apache.nutch.parse.HtmlParseFilter">
>      <implementation id="MyParseFilter"
>         class="com.mycompany.nutch.parse.MyParseFilter">
>      </implementation>
>   </extension>
>
> and put it into the plugin.xml file under $NUTCH_HOME\src\myplugin
> directory. But my filter is never called. Any ideas?
>

 Have a look at the wiki page e.g
http://wiki.apache.org/nutch/WritingPluginExample-0.9




-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Reply via email to