write code in java for nutch for index filter

hala Sun, 10 Apr 2011 08:57:30 -0700


hello: im working in write a code in java for nutch(open source search
engine) to remove the movments from arabic words in the indexer. i dont know
what is the error in it this is the code:
***************************************************
package com.mycompany.nutch.indexing; import
org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.Text;
import org.apache.log4j.Logger; import org.apache.nutch.crawl.CrawlDatum;
import org.apache.nutch.crawl.Inlinks; import
org.apache.nutch.indexer.IndexingException; import
org.apache.nutch.indexer.IndexingFilter; import
org.apache.nutch.indexer.NutchDocument; import
org.apache.nutch.parse.getData().parse.getData();


public class InvalidUrlIndexFilter implements IndexingFilter {

private static final Logger LOGGER =
Logger.getLogger(InvalidUrlIndexFilter.class);

private Configuration conf;

public void addIndexBackendOptions(Configuration conf) { // NOOP return; }

public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
CrawlDatum datum, Inlinks inlinks) throws IndexingException { if (url ==
null) { return null; }

char[] parse.getData() = input.trim().toCharArray();
    for(int p=0;p<parse.getData().length;p++)
     
if(!(parse.getData()[p]=='َ'||parse.getData()[p]=='ً'||parse.getData()[p]=='ُ'||parse.getData()[p]=='ِ'||parse.getData()[p]=='ٍ'||parse.getData()[p]=='ٌ'
||parse.getData()[p]=='ّ'||parse.getData()[p]=='ْ' ||parse.getData()[p]=='"'
))
        new String.append(parse.getData()[p]);

return doc;

}

public Configuration getConf() { return conf; }

public void setConf(Configuration conf) { this.conf = conf; } }
****************************************************************
i think that the error in using parse.getdata() but i dont know what i
should use instead of it???????????


--
View this message in context: 
http://lucene.472066.n3.nabble.com/write-code-in-java-for-nutch-for-index-filter-tp2801728p2801728.html
Sent from the Nutch - User mailing list archive at Nabble.com.

write code in java for nutch for index filter

Reply via email to