Hi-

I'm the developer. It's not a production of the OpenNLP crew. Please sign up for the SOLR JIRA and add this report to the LUCENE-2899 entry.

1)
The POS filters only add payloads to the search terms. Your query ignores payloads, so I don't see the point of this definition. If you then add a FilterPayloadFilter to the bottom of the stack, you can limit the query to the words found.

2)
The POS algorithm is statistical, and it trains on both the words and the pattern of surrounding words. A single word may not trigger, where 'a guy named Brett is here' will find the word 'Brett'.

3)
The POS models are trained on old data. I think the names and organizations models were trained on newspaper data from 20 years ago. The organizations filter will not find "Google".

Lance

On 09/26/2013 10:01 AM, rashi gandhi wrote:

HI,

I am working on OpenNLP integration with SOLR. I have successfully applied the patch (LUCENE-2899-x.patch) to latest SOLR source code (branch_4x).

I have designed OpenNLP analyzer and index data to it. Analyzer declaration in schema.xml is as

<fieldType name="nlp_type" class="solr.TextField" positionIncrementGap="100">

                                <analyzer type="index">

<!-- Sequence of tokenizers and filters applied at the index time-->

<tokenizer class="solr.StandardTokenizerFactory"/>

<filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

<filter class="solr.SnowballPorterFilterFactory"/>

<filter class="solr.ASCIIFoldingFilterFactory"/>

                                </analyzer>

                                <analyzer type="query">

<!-- Sequence of tokenizers and filters applied at the index time-->

<tokenizer class="solr.StandardTokenizerFactory"/>

<filter class="solr.OpenNLPFilterFactory" posTaggerModel="opennlp/en-pos-maxent.bin"/>

<filter class="solr.OpenNLPFilterFactory" nerTaggerModels="opennlp/en-ner-person.bin"/>

<filter class="solr.OpenNLPFilterFactory" nerTaggerModels="opennlp/en-ner-location.bin"/>

<filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>

 </analyzer>

</fieldType>

And field declared for this analyzer:

<field name="Detail_Person" type="nlp_type" indexed="true" stored="true" omitNorms="true" omitPositions="true"/>

Problem is here : When I search over this field Detail_Person, results are not constant.

When I search Detail_Person:brett, it return one document

But again when I fire the same query, it return zero document.


And also these are logs:

97139 [http-bio-8080-exec-9] INFO org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create

97139 [http-bio-8080-exec-9] INFO org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create

97139 [http-bio-8080-exec-9] INFO org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create

97154 [http-bio-8080-exec-9] INFO org.apache.solr.core.SolrCore û [collection1] webapp=/solr path=/select params={fl=score,*&indent=true&q=Detail_Pe

rson:rashi&wt=json} hits=1 status=0 QTime=15

97154 [http-bio-8080-exec-9] DEBUG org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest: {{params(fl=score,*&indent=true&q=Detail_Per

son:rashi&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}

134874 [http-bio-8080-exec-3] INFO org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create

134890 [http-bio-8080-exec-3] INFO org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create

134890 [http-bio-8080-exec-3] INFO org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create

134906 [http-bio-8080-exec-3] INFO org.apache.solr.core.SolrCore û [collection1] webapp=/solr path=/select params={fl=score,*&indent=true&q=Detail_P

erson:brett&wt=json} hits=2 status=0 QTime=32

134906 [http-bio-8080-exec-3] DEBUG org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest: {{params(fl=score,*&indent=true&q=Detail_Pe

rson:brett&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}

147136 [http-bio-8080-exec-3] INFO org.apache.solr.core.SolrCore û [collection1] webapp=/solr path=/select params={fl=score,*&indent=true&q=Detail_P

erson:john&wt=json} hits=0 status=0 QTime=0

147136 [http-bio-8080-exec-3] DEBUG org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest: {{params(fl=score,*&indent=true&q=Detail_Pe

rson:john&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}

302164 [http-bio-8080-exec-10] INFO org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create

302164 [http-bio-8080-exec-10] INFO org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create

302164 [http-bio-8080-exec-10] INFO org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create

302164 [http-bio-8080-exec-10] INFO org.apache.solr.core.SolrCore û [collection1] webapp=/solr path=/select params={fl=score,*&indent=true&q=Detail_

Person:john&wt=json} hits=1 status=0 QTime=15

302164 [http-bio-8080-exec-10] DEBUG org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest: {{params(fl=score,*&indent=true&q=Detail_P

erson:john&wt=json),defaults(

df=text&echoParams=explicit&rows=10)}}


Searching is not stable on OpenNLP field, sometimes it return documents and sometimes not but documents are there.

And if I search on non OpenNLP fields, it is working properly, results are stable and correct.

Please help me to make solr results consistent.

Thanks in Advance.

Reply via email to