Hi-
I'm the developer. It's not a production of the OpenNLP crew. Please
sign up for the SOLR JIRA and add this report to the LUCENE-2899 entry.
1)
The POS filters only add payloads to the search terms. Your query
ignores payloads, so I don't see the point of this definition. If you
then add a FilterPayloadFilter to the bottom of the stack, you can limit
the query to the words found.
2)
The POS algorithm is statistical, and it trains on both the words and
the pattern of surrounding words. A single word may not trigger, where
'a guy named Brett is here' will find the word 'Brett'.
3)
The POS models are trained on old data. I think the names and
organizations models were trained on newspaper data from 20 years ago.
The organizations filter will not find "Google".
Lance
On 09/26/2013 10:01 AM, rashi gandhi wrote:
HI,
I am working on OpenNLP integration with SOLR. I have successfully
applied the patch (LUCENE-2899-x.patch) to latest SOLR source code
(branch_4x).
I have designed OpenNLP analyzer and index data to it. Analyzer
declaration in schema.xml is as
<fieldType name="nlp_type" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<!-- Sequence of tokenizers and
filters applied at the index time-->
<tokenizer
class="solr.StandardTokenizerFactory"/>
<filter
class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter
class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter
class="solr.SnowballPorterFilterFactory"/>
<filter
class="solr.ASCIIFoldingFilterFactory"/>
</analyzer>
<analyzer type="query">
<!-- Sequence of tokenizers and
filters applied at the index time-->
<tokenizer
class="solr.StandardTokenizerFactory"/>
<filter
class="solr.OpenNLPFilterFactory"
posTaggerModel="opennlp/en-pos-maxent.bin"/>
<filter
class="solr.OpenNLPFilterFactory"
nerTaggerModels="opennlp/en-ner-person.bin"/>
<filter
class="solr.OpenNLPFilterFactory"
nerTaggerModels="opennlp/en-ner-location.bin"/>
<filter
class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
</analyzer>
</fieldType>
And field declared for this analyzer:
<field name="Detail_Person" type="nlp_type" indexed="true"
stored="true" omitNorms="true" omitPositions="true"/>
Problem is here : When I search over this field Detail_Person, results
are not constant.
When I search Detail_Person:brett, it return one document
But again when I fire the same query, it return zero document.
And also these are logs:
97139 [http-bio-8080-exec-9] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
create
97139 [http-bio-8080-exec-9] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
create
97139 [http-bio-8080-exec-9] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
create
97154 [http-bio-8080-exec-9] INFO org.apache.solr.core.SolrCore û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_Pe
rson:rashi&wt=json} hits=1 status=0 QTime=15
97154 [http-bio-8080-exec-9] DEBUG
org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_Per
son:rashi&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
134874 [http-bio-8080-exec-3] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
create
134890 [http-bio-8080-exec-3] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
create
134890 [http-bio-8080-exec-3] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
create
134906 [http-bio-8080-exec-3] INFO org.apache.solr.core.SolrCore û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_P
erson:brett&wt=json} hits=2 status=0 QTime=32
134906 [http-bio-8080-exec-3] DEBUG
org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_Pe
rson:brett&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
147136 [http-bio-8080-exec-3] INFO org.apache.solr.core.SolrCore û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_P
erson:john&wt=json} hits=0 status=0 QTime=0
147136 [http-bio-8080-exec-3] DEBUG
org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_Pe
rson:john&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
302164 [http-bio-8080-exec-10] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
create
302164 [http-bio-8080-exec-10] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
create
302164 [http-bio-8080-exec-10] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
create
302164 [http-bio-8080-exec-10] INFO org.apache.solr.core.SolrCore û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_
Person:john&wt=json} hits=1 status=0 QTime=15
302164 [http-bio-8080-exec-10] DEBUG
org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_P
erson:john&wt=json),defaults(
df=text&echoParams=explicit&rows=10)}}
Searching is not stable on OpenNLP field, sometimes it return
documents and sometimes not but documents are there.
And if I search on non OpenNLP fields, it is working properly, results
are stable and correct.
Please help me to make solr results consistent.
Thanks in Advance.