Since the official OpenNLP filter is not yet in an actual release, I'm
experimenting with the OpenNLP filter implementation described in chapter 8
of the Taming Text Book http://www.manning.com/ingersoll/Sample-ch08.pdf .

The original code is at :
https://github.com/tamingtext/book/tree/master/src/main/java/com/tamingtext/texttamer/solr
, I made a few minor changes to reflect the SOLR 4.x interface changes. The
Name filter described should extract people, dates, locations etc from the
text.

Schema config:


Questions:
1. When I run a query on a term that shouldn't exist: such as
https://gist.github.com/anonymous/5060539 , I actually get 2 results back !!

2. Looking at the index, I "data" field entries (data is the field i
associated with fieldname text_opennlp) such as ne_location.
Then again, searching for data:ne_location yields far fewer hits than
data:london . I don't expect a perfect match, given that this is an NLP type
filter, but there appears to be something wrong with the way I'm looking at
this.

3. Running a simple analysis on a multi sentence block of text, this is what
I see (screenshot at http://i.imgur.com/5ORgnRt.png). The filter appears to
work. However, from the query perspective, .. how could I query the
processed data for "John as a person", (as opposed to John as an
organization). I feel that this could better be achieved by saving the
relevant information to other dedicated fields (person, organization, place
etc .. ) that map with OpenNlp's capabilities. Open to ideas and suggestions
here. I'm still learning.

Thanks






--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-am-I-doing-wrong-writing-an-OpenNLP-Filter-tp4043799.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to