Thanks Hoss and Yonik. Hoss, you had a particluarly pertinent passage: > ... because the normal Lucene QueryParser uses whitespace ... > and breaks up the input on the whitespace boundaries > before it ever passes those chunks ... to the analyzers
This is EXACTLY what the issue is. At first I thought it was the result of using dismax, but from what you said, I'm guessing it affects all queries. And does somebody have a "worked" example of engineering around it. Yonik, I was surprised by your IBM comments, because based on what they had presented at the meetup, I also thought it would be more "granular". Have you chatted with them to confirm? -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 On Thu, Aug 20, 2009 at 7:16 PM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > > : Subject: Overview of Query Parsing API Stack? / Dismax parsing, > : new 1.4 parsing, etc. > > Oh, what i would give for time to sit and document in depth how some of > this stuff works (assuming i first had time to verify that it really does > work the way i think) > > The nutshell answer is that as far as solr (1.4) is concerned, the main > unit of "query parsing" is a QParser ... lots of places in the code base > may care about parsing differnet strngs for the purposes of producting a > Query object, but ultimately they all use a QParser. > > QParsers are plugins that you can configure instances of in your > solrcinfog.xml and assign names to. by default, all of various pieces of > code in solr that do any sort of query related parsing use some basic > convention to pick a QParser by name -- so StandardRequestHandler uses the > QParser named "lucene" for parsing the "q" param, while > DisMaxRequestHandler uses a QParser named "dismax" for "q", and "func" for > the "bf" param. so if you wanted to make some change so that *any* code > path anywhere attempting to use the lucene syntax got your custom query > parsing logic, you could configure a QParser with the name "lucene" and > override the default. > > The brilliantly confusing magic comes into play when strings to be parsed > start with the "local params" syntax (ie: "{!foo a=f,b=z}blah blah" ... > that tells the parsing code to override whatever QParser it would have > used for that string, and to pass everything after the "}" charcter to the > parser named "foo", with a=f and b=z added to the list of SolrParams it's > already got (from the query string, or default params in solrconfig, > etc...) > > For most types of queries, the QParser ultimately uses Lucenes > "QueryParser" class, or some subclass of it (DisMaxQueryParser used by the > DisMaxQPlugin is a subclass of QueryParser") and 9 times out of 10 if > people want to customize query parsing without inventing a 100% new > syntax, they also write a subclass. > > coming in Lucene 2.9 (which is what Solr 1.4 will use) is a completley new > QueryParser framework, which (i'm told) is suppose to make it much easier > to create custom query parser syntaxs, but i haven't had time to look at > it to see what all hte fuss is about. so in theory you could use it to > implement a new QPlugin in SOlr 1.4. > > no matter how you ultimately implement code that goes from "String" to > "Query" you have to be concerned about the type of data in the field that > Query objects refers to (if it was lowercased at index time, you want to > lowercase at query time, etc...). Solr does it's best to help query > parsers out by supporting an <analyer type="query"/> in the schema.xml so > that the schema creator that specify how to "analyze" a piece of > input when building queries, but depending on the query syntax it's not > always easy to get the behavior you expect from a particular query parser > / analyzer pair (This part of query parsing typically trips people up when > dealing with multiword synonyms, or analyzers that don't tokenize on > whitespace, because the normal Lucene QueryParser uses whitespace as part > of it's markup, and breaks up the input on the whitespace boundaries > before it ever passes those chunks of input to the analyzers) > > : But trying traipse through the code to get "the big picture" is a bit > : involved. > > like i said: the world of query parsing in solr all revolves arround the > QParser API ... if you want to make sense of it, start there, and work out > in both directions. > > PS: please, please, please ... as you make progress on understanding these > internals, feel free to plagerize this email as the starting point of a > new wiki page documenting your understanding for others who come along > with teh same question. > > > -Hoss > >