Hi Erick,

I think you missed my point.  My request is, Solr support a new URL
parameter.  If this parameter is set, than EVERYTHING in q is treated as
raw text (i.e.: Solr will do the escaping vs. the client).

Thanks

Steve

On Mon, Apr 20, 2015 at 1:08 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> How does that address the example query I gave?
>
> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
> followed by a colon (:)").
>
> bq: "Solr will treat everything in the search string by first passing
> it to ClientUtils.escapeQueryChars()."
>
> would incorrectly escape the colons after field1, field, field2 and
> correctly escape the colon after d and in parens. And parens are a
> reserved character too, so it would incorrectly escape _all_ the
> parens except the ones surrounding the colon.
>
> The list of reserved characters is pretty unchanging, so I don't think
> it's too much to ask the app layer, which knows (at least it better
> know) which bits of the query were user entered, what rules apply as
> to whether the user can enter field-qualified searches etc. Only armed
> with that knowledge can the right thing be done, and Solr has no
> knowledge of those rules.
>
> If you insist that the client shouldn't deal with that, you could
> always write a custom component that enforces the rules that are
> particular to your setup. For instance, you may have a rule that you
> can never field-qualify any term, in which case escaping on the Solr
> side would work in _your_ situation. But the general case just doesn't
> fit into the "escape on the Solr side" paradigm.
>
> Best,
> Erick
>
>
> On Mon, Apr 20, 2015 at 9:55 AM, Steven White <swhite4...@gmail.com>
> wrote:
> > Hi Erick,
> >
> > I didn't know about ClientUtils.escapeQueryChars(), this is good to know.
> > Unfortunately I cannot use it because it means I have to import Solr
> > classes with my client application.  I want to avoid that and create a
> > lose coupling between my application and Solr (just rely on REST).
> >
> > My suggestion is to add a new URL parameter to Solr, such as
> > "q.ignoreOperators=[true | false]" (or some other name).  If this
> parameter
> > is set to "false" or is missing, than the current behavior takes effect,
> if
> > it is set to "true" than Solr will treat everything in the search string
> by
> > first passing it to ClientUtils.escapeQueryChars().  This way, the client
> > application doesn't have to: a) be tightly coupled with Solr (require to
> > link with Solr JARs to use escapeQueryChars), and b) keep up with Solr
> when
> > new operators are added.
> >
> > What do you think?
> >
> > Steve
> >
> > On Mon, Apr 20, 2015 at 12:41 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> Steve:
> >>
> >> In short, no. There's no good way for Solr to solve this problem in
> >> the _general_ case. Well, actually we could create parsers with rules
> >> like "if the colon is inside a paren, escape it). Which would
> >> completely break someone who wants to form queries like
> >>
> >> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
> >> followed by a colon (:)").
> >>
> >> You say: " A better solution would be to have Solr support a new
> >> parameter that I can pass to Solr as part of the URL."
> >>
> >> How would Solr know _which_ parts of the URL to escape in the case
> above?
> >>
> >> You have to do this at the app layer as that's the only place that has
> >> a clue what the peculiarities of the situation are.
> >>
> >> But if you're using SolrJ in your app layer, you can use
> >> ClientUtils.escapeQueryChars() for user-entered data to do the
> >> escaping without you having to maintain a separate list.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Apr 20, 2015 at 8:39 AM, Steven White <swhite4...@gmail.com>
> >> wrote:
> >> > Hi Shawn,
> >> >
> >> > If the user types "title:(Apache: Solr Notes)" (without quotes) than I
> >> want
> >> > Solr to treat the whole string as raw text string as if I escaped ":",
> >> "("
> >> > and ")" and any other reserved Solr keywords / tokens.  Using dismax
> it
> >> > worked for the ":" case, but I still get SyntaxError if I pass it the
> >> > following "title:(Apache: Solr Notes) AND" (here is the full URL):
> >> >
> >> >
> >> >
> >>
> http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title
> >> >
> >> > So far, the only solution I can find is for my application to escape
> all
> >> > Solr operators before sending the string to Solr.  This is fine, but
> it
> >> > means my application will have to adopt to Solr's reserved operators
> as
> >> > Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that
> to
> >> my
> >> > applications escape list).  A better solution would be to have Solr
> >> support
> >> > a new parameter that I can pass to Solr as part of the URL.
> >> > This parameter will tell Solr to do the escaping for me or not
> (missing
> >> > means the same as don't do the escaping).
> >> >
> >> > Thanks
> >> >
> >> > Steve
> >> >
> >> > On Mon, Apr 20, 2015 at 10:05 AM, Shawn Heisey <apa...@elyograg.org>
> >> wrote:
> >> >
> >> >> On 4/20/2015 7:41 AM, Steven White wrote:
> >> >> > In my application, a user types "Apache Solr Notes".  I take that
> text
> >> >> and
> >> >> > send it over to Solr like so:
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >>
> http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND
> >> >> >
> >> >> > And I get a hit on "Apache Solr Release Notes".  This is all good.
> >> >> >
> >> >> > Now if the same user types "Apache: Solr Notes" (notice the ":"
> after
> >> >> > "Apache") I will get a SyntaxError.  The fix is to escape ":"
> before I
> >> >> send
> >> >> > it to Solr.  What I want to figure out is how can I tell Solr /
> >> Lucene to
> >> >> > ignore ":" and escape it for me?  In this example, I used ":" but
> my
> >> need
> >> >> > is for all other operators and reserved Solr / Lucene characters.
> >> >>
> >> >> If we assume that what you did for the first query is what you will
> do
> >> >> for the second query, then this is what you would have sent:
> >> >>
> >> >> q=title:(Apache: Solr Notes)
> >> >>
> >> >> How is the parser supposed to know that only the second colon should
> be
> >> >> escaped, and not the first one?  If you escape them both (or treat
> the
> >> >> entire query string as query text), then the fact that you are
> searching
> >> >> the "title" field is lost.  The text "title" becomes an actual part
> of
> >> >> the query, and may not match, depending on what you have done with
> other
> >> >> parameters, such as the default operator.
> >> >>
> >> >> If you use the dismax parser (*NOT* the edismax parser, which parses
> >> >> field:value queries and boolean operator syntax just like the lucene
> >> >> parser), you may be able to achieve what you're after.
> >> >>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
> >> >> https://wiki.apache.org/solr/DisMaxQParserPlugin
> >> >>
> >> >> With dismax, you would use the qf and possibly the pf parameter to
> tell
> >> >> it which fields to search and send this as the query:
> >> >>
> >> >> q=Apache: Solr Notes
> >> >>
> >> >> Thanks,
> >> >> Shawn
> >> >>
> >> >>
> >>
>

Reply via email to