Thank guyz for quick response. If you could point me to any working example of ParseFilter and/or IndexFilter would be great.
Regards, Tony On Wed, Jun 12, 2013 at 5:46 PM, Julien Nioche < [email protected]> wrote: > They are called ParseFilters in 2.x : > http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/ParseFilter.html > as they are not limited to processing HTML documents since Tika generates > SAX events for other mimetypes > > J. > > > On 12 June 2013 13:37, Tony Mullins <[email protected]> wrote: > > > Hi , > > > > If I go to http://wiki.apache.org/nutch/AboutPlugins ,here it shows me > > HTMLParseFilter is extension point for adding custom metadata to HTML and > > its 'Filter' method's signature is 'public ParseResult filter(Content > > content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment > > doc)' but its in api 1.4 doc. > > > > I am on Nutch 2.2 and there is no class by name of HTMLParseFilter in > v2.2 > > api doc > > http://nutch.apache.org/apidocs-2.2/allclasses-noframe.html. > > > > So please tell me which class to use in v2.2 api for adding my custom > rule > > to extract some data from HTML page (is it ParseFilter ?) and add it to > > HMTL metadata so later then I could add it to my Solr using indexfilter > > plugin. > > > > > > Thanks, > > Tony. > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >

