Re: "Indexed without position data" - strange exception in eDisMax (Solr 4.0beta)

Jack Krupansky Mon, 01 Oct 2012 11:56:34 -0700

"autoGeneratePhraseQueries" simply means that terms with embeddedpunctuation that gets filtered to be whitespace will treat the resultingsub-terms as if they were a quoted phrase, so A-D gets treated as "A D", aphrase.

Did you maybe index the initial data with different field types and thenchange to the current field types, say "string" and then move to "text_..."?If so, simply delete the index and re-index.


-- Jack Krupansky

-----Original Message-----From: Alexandre Rafalovitch

Sent: Monday, October 01, 2012 12:32 PM
To: solr-user@lucene.apache.org

Subject: Re: "Indexed without position data" - strange exception in eDisMax(Solr 4.0beta)


I use text_en_splitting from example distribution, which does have
autoGeneratePhraseQueries:
       <fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">

But I do not see the other omit options anywhere in solr.config and
definitely not for the field or field type definition:
<field name="NamesEN" type="text_en_splitting" multiValued="true"
indexed="true" stored="true" />

I am not sure what the defaults are either. I am using version 1.5 of
the schema and it says in the example schema.xml:
      1.2: omitTermFreqAndPositions attribute introduced, true by
default except for text fields.
Does it mean it is by default true for "text_en_splitting" or false
because it is a text field. Wiki does not say anything else.

I can disable autoGeneratePhraseQueries as the next step I guess
(still not sure exactly what it does anyway), but I still remain
somewhat confused on the other two fields.

Regards,
  Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

On Mon, Oct 1, 2012 at 12:00 PM, Jack Krupansky <j...@basetechnology.com>wrote:

You probably have omitTermFreqAndPositions=true or omitPositions=true in

your schema for that field. You MUST have position info to use phrasequery.

-- Jack Krupansky

-----Original Message----- From: Alexandre Rafalovitch
Sent: Monday, October 01, 2012 11:53 AM
To: solr-user@lucene.apache.org
Subject: "Indexed without position data" - strange exception in eDisMax
(Solr 4.0beta)

I am getting a very strange exception when I use edismax handler and
search query contains keyword with a dash (but only some keywords with
a dash).

The exception is:
1-Oct-2012 11:45:38 AM org.apache.solr.core.SolrCore execute
INFO: [kb] webapp=/solr path=/select
params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+A-D)&rows=0&version=2.2}
status=500 QTime=14
1-Oct-2012 11:45:38 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.IllegalStateException: field "NamesEN" was
indexed without position data; cannot run PhraseQuery (term=a)
       at
org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:274)
       at
org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160)
       at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
       at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:571)
       at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:275)
       at
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1514)
       at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1261)
       at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:390)
       at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:411)
       at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
       at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
       at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
       at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)

The field definition it complaints about is:
<field name="NamesEN" type="text_en_splitting" multiValued="true"
indexed="true" stored="true" />
And the eDisMax definition is (in solrconfig.xml):
<str name="qf">
.... Id NamesEN^5 Organizations ....
</str>

The strange thing it does not seem to happen for all 'X-Y" sequences.
Here is one that works just before:
1-Oct-2012 11:45:22 AM org.apache.solr.core.SolrCore execute
INFO: [kb] webapp=/solr path=/select
params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+ABC-D)&rows=0&version=2.2}
hits=13 status=0 QTime=105

I don't mind if the results are slightly off, I am still tuning the
full text search. But I am not sure what to do with the exception
above. Do I need to 'index position data' somehow? Do I need to escape
dash? Did I hit a rare bug in "handle anything thrown at it" eDisMax?

Any pointers would be appreciated.

Regards,
  Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD

book)

Re: "Indexed without position data" - strange exception in eDisMax (Solr 4.0beta)

Reply via email to