Yes, an UpdateRequestProcessor is the API to implement for these sorts of 
requirements. In the URP you have access to a SolrDocument object that carries 
the input data. You can inspect the fields, and add, remove or modify fields if 
you want, or discard the input altogether.

So, check your text input field for 'profanity' and set another boolean field 
if it matches or doesn't. If you are using a list of words - or an SVM or 
another machine learning algorithm - to detect provanity is up to you.

Cheers,
Markus
 
-----Original message-----
> From:Sadiki Latty <sla...@uottawa.ca>
> Sent: Monday 8th January 2018 22:12
> To: solr-user@lucene.apache.org
> Subject: Profanity
> 
> Hey
> 
> I would like to find a solution to flag (at index-time) profanity. Optimally, 
> it would be good if it function similar to stopwords in the sense that I can 
> have a predefined list that is read and if token is on the list that document 
> is 'flagged' in a different field. Does anyone know of solution (outside of 
> configuring my own). If none exists and I end up configuring my own would I 
> be doing this in the updateprcoessor phase. I am still fairly new to Solr, 
> but from what I've read, that seems to be the best place to look.
> 
> 
> Thanks,
> 
> Sid
> 

Reply via email to