Thank guyz for quick response.
If you could point me to any working example of ParseFilter and/or
IndexFilter would be great.

Regards,
Tony


On Wed, Jun 12, 2013 at 5:46 PM, Julien Nioche <
[email protected]> wrote:

> They are called ParseFilters in 2.x :
> http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/ParseFilter.html
> as they are not limited to processing HTML documents since Tika generates
> SAX events for other mimetypes
>
> J.
>
>
> On 12 June 2013 13:37, Tony Mullins <[email protected]> wrote:
>
> > Hi ,
> >
> > If I go to http://wiki.apache.org/nutch/AboutPlugins  ,here  it shows me
> > HTMLParseFilter is extension point for adding custom metadata to HTML and
> > its 'Filter' method's signature is 'public ParseResult filter(Content
> > content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment
> > doc)'  but its in api 1.4 doc.
> >
> > I am on Nutch 2.2 and there is no class by name of HTMLParseFilter in
>  v2.2
> > api doc
> > http://nutch.apache.org/apidocs-2.2/allclasses-noframe.html.
> >
> > So please tell me which class to use in v2.2 api for adding my custom
> rule
> > to extract some data from HTML page (is it ParseFilter ?) and add it to
> > HMTL metadata so later then I could add it to my Solr using indexfilter
> > plugin.
> >
> >
> > Thanks,
> > Tony.
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Reply via email to