They are called ParseFilters in 2.x :
http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/ParseFilter.html
as they are not limited to processing HTML documents since Tika generates
SAX events for other mimetypes

J.


On 12 June 2013 13:37, Tony Mullins <[email protected]> wrote:

> Hi ,
>
> If I go to http://wiki.apache.org/nutch/AboutPlugins  ,here  it shows me
> HTMLParseFilter is extension point for adding custom metadata to HTML and
> its 'Filter' method's signature is 'public ParseResult filter(Content
> content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment
> doc)'  but its in api 1.4 doc.
>
> I am on Nutch 2.2 and there is no class by name of HTMLParseFilter in  v2.2
> api doc
> http://nutch.apache.org/apidocs-2.2/allclasses-noframe.html.
>
> So please tell me which class to use in v2.2 api for adding my custom rule
> to extract some data from HTML page (is it ParseFilter ?) and add it to
> HMTL metadata so later then I could add it to my Solr using indexfilter
> plugin.
>
>
> Thanks,
> Tony.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to