Hi Erick,

I didn't know about ClientUtils.escapeQueryChars(), this is good to know.
Unfortunately I cannot use it because it means I have to import Solr
classes with my client application.  I want to avoid that and create a
lose coupling between my application and Solr (just rely on REST).

My suggestion is to add a new URL parameter to Solr, such as
"q.ignoreOperators=[true | false]" (or some other name).  If this parameter
is set to "false" or is missing, than the current behavior takes effect, if
it is set to "true" than Solr will treat everything in the search string by
first passing it to ClientUtils.escapeQueryChars().  This way, the client
application doesn't have to: a) be tightly coupled with Solr (require to
link with Solr JARs to use escapeQueryChars), and b) keep up with Solr when
new operators are added.

What do you think?

Steve

On Mon, Apr 20, 2015 at 12:41 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Steve:
>
> In short, no. There's no good way for Solr to solve this problem in
> the _general_ case. Well, actually we could create parsers with rules
> like "if the colon is inside a paren, escape it). Which would
> completely break someone who wants to form queries like
>
> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
> followed by a colon (:)").
>
> You say: " A better solution would be to have Solr support a new
> parameter that I can pass to Solr as part of the URL."
>
> How would Solr know _which_ parts of the URL to escape in the case above?
>
> You have to do this at the app layer as that's the only place that has
> a clue what the peculiarities of the situation are.
>
> But if you're using SolrJ in your app layer, you can use
> ClientUtils.escapeQueryChars() for user-entered data to do the
> escaping without you having to maintain a separate list.
>
> Best,
> Erick
>
> On Mon, Apr 20, 2015 at 8:39 AM, Steven White <swhite4...@gmail.com>
> wrote:
> > Hi Shawn,
> >
> > If the user types "title:(Apache: Solr Notes)" (without quotes) than I
> want
> > Solr to treat the whole string as raw text string as if I escaped ":",
> "("
> > and ")" and any other reserved Solr keywords / tokens.  Using dismax it
> > worked for the ":" case, but I still get SyntaxError if I pass it the
> > following "title:(Apache: Solr Notes) AND" (here is the full URL):
> >
> >
> >
> http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title
> >
> > So far, the only solution I can find is for my application to escape all
> > Solr operators before sending the string to Solr.  This is fine, but it
> > means my application will have to adopt to Solr's reserved operators as
> > Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that to
> my
> > applications escape list).  A better solution would be to have Solr
> support
> > a new parameter that I can pass to Solr as part of the URL.
> > This parameter will tell Solr to do the escaping for me or not (missing
> > means the same as don't do the escaping).
> >
> > Thanks
> >
> > Steve
> >
> > On Mon, Apr 20, 2015 at 10:05 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> >
> >> On 4/20/2015 7:41 AM, Steven White wrote:
> >> > In my application, a user types "Apache Solr Notes".  I take that text
> >> and
> >> > send it over to Solr like so:
> >> >
> >> >
> >> >
> >>
> http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND
> >> >
> >> > And I get a hit on "Apache Solr Release Notes".  This is all good.
> >> >
> >> > Now if the same user types "Apache: Solr Notes" (notice the ":" after
> >> > "Apache") I will get a SyntaxError.  The fix is to escape ":" before I
> >> send
> >> > it to Solr.  What I want to figure out is how can I tell Solr /
> Lucene to
> >> > ignore ":" and escape it for me?  In this example, I used ":" but my
> need
> >> > is for all other operators and reserved Solr / Lucene characters.
> >>
> >> If we assume that what you did for the first query is what you will do
> >> for the second query, then this is what you would have sent:
> >>
> >> q=title:(Apache: Solr Notes)
> >>
> >> How is the parser supposed to know that only the second colon should be
> >> escaped, and not the first one?  If you escape them both (or treat the
> >> entire query string as query text), then the fact that you are searching
> >> the "title" field is lost.  The text "title" becomes an actual part of
> >> the query, and may not match, depending on what you have done with other
> >> parameters, such as the default operator.
> >>
> >> If you use the dismax parser (*NOT* the edismax parser, which parses
> >> field:value queries and boolean operator syntax just like the lucene
> >> parser), you may be able to achieve what you're after.
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
> >> https://wiki.apache.org/solr/DisMaxQParserPlugin
> >>
> >> With dismax, you would use the qf and possibly the pf parameter to tell
> >> it which fields to search and send this as the query:
> >>
> >> q=Apache: Solr Notes
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>

Reply via email to