"autoGeneratePhraseQueries" simply means that terms with embedded punctuation that gets filtered to be whitespace will treat the resulting sub-terms as if they were a quoted phrase, so A-D gets treated as "A D", a phrase.

Did you maybe index the initial data with different field types and then change to the current field types, say "string" and then move to "text_..."? If so, simply delete the index and re-index.

-- Jack Krupansky

-----Original Message----- From: Alexandre Rafalovitch
Sent: Monday, October 01, 2012 12:32 PM
To: solr-user@lucene.apache.org
Subject: Re: "Indexed without position data" - strange exception in eDisMax (Solr 4.0beta)

I use text_en_splitting from example distribution, which does have
autoGeneratePhraseQueries:
       <fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">

But I do not see the other omit options anywhere in solr.config and
definitely not for the field or field type definition:
<field name="NamesEN" type="text_en_splitting" multiValued="true"
indexed="true" stored="true" />

I am not sure what the defaults are either. I am using version 1.5 of
the schema and it says in the example schema.xml:
      1.2: omitTermFreqAndPositions attribute introduced, true by
default except for text fields.
Does it mean it is by default true for "text_en_splitting" or false
because it is a text field. Wiki does not say anything else.

I can disable autoGeneratePhraseQueries as the next step I guess
(still not sure exactly what it does anyway), but I still remain
somewhat confused on the other two fields.

Regards,
  Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Oct 1, 2012 at 12:00 PM, Jack Krupansky <j...@basetechnology.com> wrote:
You probably have omitTermFreqAndPositions=true or omitPositions=true in
your schema for that field. You MUST have position info to use phrase query.

-- Jack Krupansky

-----Original Message----- From: Alexandre Rafalovitch
Sent: Monday, October 01, 2012 11:53 AM
To: solr-user@lucene.apache.org
Subject: "Indexed without position data" - strange exception in eDisMax
(Solr 4.0beta)


I am getting a very strange exception when I use edismax handler and
search query contains keyword with a dash (but only some keywords with
a dash).

The exception is:
1-Oct-2012 11:45:38 AM org.apache.solr.core.SolrCore execute
INFO: [kb] webapp=/solr path=/select
params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+A-D)&rows=0&version=2.2}
status=500 QTime=14
1-Oct-2012 11:45:38 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.IllegalStateException: field "NamesEN" was
indexed without position data; cannot run PhraseQuery (term=a)
       at
org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:274)
       at
org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160)
       at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
       at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:571)
       at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:275)
       at
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1514)
       at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1261)
       at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:390)
       at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:411)
       at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
       at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
       at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
       at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)

The field definition it complaints about is:
<field name="NamesEN" type="text_en_splitting" multiValued="true"
indexed="true" stored="true" />
And the eDisMax definition is (in solrconfig.xml):
<str name="qf">
.... Id NamesEN^5 Organizations ....
</str>

The strange thing it does not seem to happen for all 'X-Y" sequences.
Here is one that works just before:
1-Oct-2012 11:45:22 AM org.apache.solr.core.SolrCore execute
INFO: [kb] webapp=/solr path=/select
params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+ABC-D)&rows=0&version=2.2}
hits=13 status=0 QTime=105


I don't mind if the results are slightly off, I am still tuning the
full text search. But I am not sure what to do with the exception
above. Do I need to 'index position data' somehow? Do I need to escape
dash? Did I hit a rare bug in "handle anything thrown at it" eDisMax?

Any pointers would be appreciated.

Regards,
  Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Reply via email to