We'd like to implement a profanity detector for documents during indexing.
 That is, given a file of profane words, we'd like to be able to mark a
document as safe or not safe if it contains any of those words so that we
can have something similar to google's safe search.

I'm trying to figure out how best to implement this with Solr 1.4:

- An UpdateRequestProcessor would allow me to dynamically populate a "safe"
boolean field but requires me to pull out the content, tokenize it and run
each token through my set of profanities, essentially running the analysis
pipeline again.  That's a lot of overheard AFAIK.

- A TokenFilter would allow me to tap into the existing analysis pipeline so
I get the tokens for free but I can't access the document.

Any suggestions on how to best implement this?

Thanks in advance,
mike

Reply via email to