Hi Jeff, 1) I don't see the implementation id of "JSParseFilter" is used in the > parse-plugins.xml file under the $NUTCH_HOME\conf folder. Then how does > Nutch knows that this filter function should be called? >
parse-plugins.xml lists Parsers whereas JSParseFilter is a HTMLParseFilter. HTMLParseFilters get a DOM representation of the documents from the HTML or TikaParser. Nutch will automatically load the JSParseFilter along with other HTMLParseFilters provided that you list the corresponding plugin in plugin.includes > 2) I want to replace this filter with my own filter, and I wrote the > follow code: > > <extension id="com.mycompany.nutch.parse.MyParseFilter" > name="Parse JS Filter" > point="org.apache.nutch.parse.HtmlParseFilter"> > <implementation id="MyParseFilter" > class="com.mycompany.nutch.parse.MyParseFilter"> > </implementation> > </extension> > > and put it into the plugin.xml file under $NUTCH_HOME\src\myplugin > directory. But my filter is never called. Any ideas? > Have a look at the wiki page e.g http://wiki.apache.org/nutch/WritingPluginExample-0.9 -- DigitalPebble Ltd Open Source Solutions for Text Engineering http://www.digitalpebble.com

