Hi, It would be better if you open a separate thread on the JUnit question.
About the filter issue, are you using Nutch' search or Solr? Both use Lucene and are capable of queries with operators that prohibit a term. If that's Solr you're using, please consult the appropriate docs, wiki and mailings list on how to procede. I have no experience with Nutch' search capability but as it also uses Lucene i could imagine it allows these operators to be used as well. Using these operators you can exclude certain terms in documents from showing up in your search. If you filter those documents out beforehand, you cannot query for them later. Check this for information on the LuceneQParser: http://lucene.apache.org/java/2_9_1/queryparsersyntax.html Cheers, > Hi folks, > > I am sorry for adding another question to the same mail. I am also writing > a plug-in extending HtmlParser. How do I test it with JUnit? > > I see the "filter" method takes Content content, ParseResult > parseResult,HTMLMetaTags metaTags, DocumentFragment doc as argument. How > can I generate these parameters of the test purpose? > > Thanks, > Abi > > On Tue, Feb 1, 2011 at 12:10 PM, .: Abhishek :. <[email protected]> wrote: > > Hi all, > > > > I am planning to implement a negative keyword indexer such that if a > > > > negative keyword appears in a segment I should never show up it during > > the search. I have the following steps in mind, please let me know if > > its right. > > > > - Writing a plug-in > > > > - Extend the IndexingFilter. > > - Do a NutchDocument.removeField for the negative keyword. > > - return the doc > > > > Now the questions are, > > > > - The NutchDocument is always mapped as a HTML page, so if I am doing > > the thing above, Am I really removing the segment from getting indexed > > or am I preventing the page from being indexed? > > > > Also, please let me know what I am intending to do is right? Thanks > > again > > > > all for your time. > > > > Cheers, > > Abhi

