Since the official OpenNLP filter is not yet in an actual release, I'm experimenting with the OpenNLP filter implementation described in chapter 8 of the Taming Text Book http://www.manning.com/ingersoll/Sample-ch08.pdf .
The original code is at : https://github.com/tamingtext/book/tree/master/src/main/java/com/tamingtext/texttamer/solr , I made a few minor changes to reflect the SOLR 4.x interface changes. The Name filter described should extract people, dates, locations etc from the text. Schema config: Questions: 1. When I run a query on a term that shouldn't exist: such as https://gist.github.com/anonymous/5060539 , I actually get 2 results back !! 2. Looking at the index, I "data" field entries (data is the field i associated with fieldname text_opennlp) such as ne_location. Then again, searching for data:ne_location yields far fewer hits than data:london . I don't expect a perfect match, given that this is an NLP type filter, but there appears to be something wrong with the way I'm looking at this. 3. Running a simple analysis on a multi sentence block of text, this is what I see (screenshot at http://i.imgur.com/5ORgnRt.png). The filter appears to work. However, from the query perspective, .. how could I query the processed data for "John as a person", (as opposed to John as an organization). I feel that this could better be achieved by saving the relevant information to other dedicated fields (person, organization, place etc .. ) that map with OpenNlp's capabilities. Open to ideas and suggestions here. I'm still learning. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/What-am-I-doing-wrong-writing-an-OpenNLP-Filter-tp4043799.html Sent from the Solr - User mailing list archive at Nabble.com.