On Tue, 2010-07-13 at 09:43 +0100, Julien Nioche wrote:
> Hi Jeff,
>
> 1) I don't see the implementation id of "JSParseFilter" is used in the
> > parse-plugins.xml file under the $NUTCH_HOME\conf folder. Then how does
> > Nutch knows that this filter function should be called?
> >
>
> parse-plugins.xml lists Parsers whereas JSParseFilter is a HTMLParseFilter.
> HTMLParseFilters get a DOM representation of the documents from the HTML or
> TikaParser.
> Nutch will automatically load the JSParseFilter along with other
> HTMLParseFilters provided that you list the corresponding plugin in
> plugin.includes
Two questions 1) the implementation id of JSParseFilter is not in the
plugin.includes section, but JSParseFilter is still being called. Why?
2) there are two extension points in the parse-js.xml file under
JSParseFilter folder.
<extension id="org.apache.nutch.parse.js"
name="JS Parser"
point="org.apache.nutch.parse.Parser">
<implementation id="JSParser"
class="org.apache.nutch.parse.js.JSParseFilter">
<parameter name="contentType" value="application/x-javascript"/>
<parameter name="pathSuffix" value="js"/>
</implementation>
</extension>
<extension id="org.apache.nutch.parse.js.JSParseFilter"
name="Parse JS Filter"
point="org.apache.nutch.parse.HtmlParseFilter">
<implementation id="JSParseFilter"
class="org.apache.nutch.parse.js.JSParseFilter">
<parameter name="contentType" value="application/x-javascript"/>
<parameter name="pathSuffix" value=""/>
</implementation>
</extension>
The first implementation id "JSParser" is in the parse-plugins.xml file,
while the second isn't. The interesting thing is that if I comment the
whole second implementation section, then the HTMLParseFilters can't
work correctly. Hence, the second should be what is executed when a
filter function is called. But why is the first one being put into the
parse-plugins.xml file?
3) if I do want to keep the JSParser's filter implementation and write
my own filter implementation code, how does nutch determine which filter
implementation will be use?
>
>
> > 2) I want to replace this filter with my own filter, and I wrote the
> > follow code:
> >
> > <extension id="com.mycompany.nutch.parse.MyParseFilter"
> > name="Parse JS Filter"
> > point="org.apache.nutch.parse.HtmlParseFilter">
> > <implementation id="MyParseFilter"
> > class="com.mycompany.nutch.parse.MyParseFilter">
> > </implementation>
> > </extension>
> >
> > and put it into the plugin.xml file under $NUTCH_HOME\src\myplugin
> > directory. But my filter is never called. Any ideas?
> >
>
> Have a look at the wiki page e.g
> http://wiki.apache.org/nutch/WritingPluginExample-0.9
>
I was following the link to write my plugin succesfully. But this
article doesn't cover how to give priority to a plugin function over a
similar plugin function as mentioned in 3)
Thanks a lot.