"autoGeneratePhraseQueries" simply means that terms with embedded
punctuation that gets filtered to be whitespace will treat the resulting
sub-terms as if they were a quoted phrase, so A-D gets treated as "A D", a
phrase.
Did you maybe index the initial data with different field types and then
change to the current field types, say "string" and then move to "text_..."?
If so, simply delete the index and re-index.
-- Jack Krupansky
-----Original Message-----
From: Alexandre Rafalovitch
Sent: Monday, October 01, 2012 12:32 PM
To: solr-user@lucene.apache.org
Subject: Re: "Indexed without position data" - strange exception in eDisMax
(Solr 4.0beta)
I use text_en_splitting from example distribution, which does have
autoGeneratePhraseQueries:
<fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
But I do not see the other omit options anywhere in solr.config and
definitely not for the field or field type definition:
<field name="NamesEN" type="text_en_splitting" multiValued="true"
indexed="true" stored="true" />
I am not sure what the defaults are either. I am using version 1.5 of
the schema and it says in the example schema.xml:
1.2: omitTermFreqAndPositions attribute introduced, true by
default except for text fields.
Does it mean it is by default true for "text_en_splitting" or false
because it is a text field. Wiki does not say anything else.
I can disable autoGeneratePhraseQueries as the next step I guess
(still not sure exactly what it does anyway), but I still remain
somewhat confused on the other two fields.
Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
book)
On Mon, Oct 1, 2012 at 12:00 PM, Jack Krupansky <j...@basetechnology.com>
wrote:
You probably have omitTermFreqAndPositions=true or omitPositions=true in
your schema for that field. You MUST have position info to use phrase
query.
-- Jack Krupansky
-----Original Message----- From: Alexandre Rafalovitch
Sent: Monday, October 01, 2012 11:53 AM
To: solr-user@lucene.apache.org
Subject: "Indexed without position data" - strange exception in eDisMax
(Solr 4.0beta)
I am getting a very strange exception when I use edismax handler and
search query contains keyword with a dash (but only some keywords with
a dash).
The exception is:
1-Oct-2012 11:45:38 AM org.apache.solr.core.SolrCore execute
INFO: [kb] webapp=/solr path=/select
params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+A-D)&rows=0&version=2.2}
status=500 QTime=14
1-Oct-2012 11:45:38 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.IllegalStateException: field "NamesEN" was
indexed without position data; cannot run PhraseQuery (term=a)
at
org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:274)
at
org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:571)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:275)
at
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1514)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1261)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:390)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:411)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
The field definition it complaints about is:
<field name="NamesEN" type="text_en_splitting" multiValued="true"
indexed="true" stored="true" />
And the eDisMax definition is (in solrconfig.xml):
<str name="qf">
.... Id NamesEN^5 Organizations ....
</str>
The strange thing it does not seem to happen for all 'X-Y" sequences.
Here is one that works just before:
1-Oct-2012 11:45:22 AM org.apache.solr.core.SolrCore execute
INFO: [kb] webapp=/solr path=/select
params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+ABC-D)&rows=0&version=2.2}
hits=13 status=0 QTime=105
I don't mind if the results are slightly off, I am still tuning the
full text search. But I am not sure what to do with the exception
above. Do I need to 'index position data' somehow? Do I need to escape
dash? Did I hit a rare bug in "handle anything thrown at it" eDisMax?
Any pointers would be appreciated.
Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
book)