How does that address the example query I gave?

q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
followed by a colon (:)").

bq: "Solr will treat everything in the search string by first passing
it to ClientUtils.escapeQueryChars()."

would incorrectly escape the colons after field1, field, field2 and
correctly escape the colon after d and in parens. And parens are a
reserved character too, so it would incorrectly escape _all_ the
parens except the ones surrounding the colon.

The list of reserved characters is pretty unchanging, so I don't think
it's too much to ask the app layer, which knows (at least it better
know) which bits of the query were user entered, what rules apply as
to whether the user can enter field-qualified searches etc. Only armed
with that knowledge can the right thing be done, and Solr has no
knowledge of those rules.

If you insist that the client shouldn't deal with that, you could
always write a custom component that enforces the rules that are
particular to your setup. For instance, you may have a rule that you
can never field-qualify any term, in which case escaping on the Solr
side would work in _your_ situation. But the general case just doesn't
fit into the "escape on the Solr side" paradigm.

Best,
Erick


On Mon, Apr 20, 2015 at 9:55 AM, Steven White <swhite4...@gmail.com> wrote:
> Hi Erick,
>
> I didn't know about ClientUtils.escapeQueryChars(), this is good to know.
> Unfortunately I cannot use it because it means I have to import Solr
> classes with my client application.  I want to avoid that and create a
> lose coupling between my application and Solr (just rely on REST).
>
> My suggestion is to add a new URL parameter to Solr, such as
> "q.ignoreOperators=[true | false]" (or some other name).  If this parameter
> is set to "false" or is missing, than the current behavior takes effect, if
> it is set to "true" than Solr will treat everything in the search string by
> first passing it to ClientUtils.escapeQueryChars().  This way, the client
> application doesn't have to: a) be tightly coupled with Solr (require to
> link with Solr JARs to use escapeQueryChars), and b) keep up with Solr when
> new operators are added.
>
> What do you think?
>
> Steve
>
> On Mon, Apr 20, 2015 at 12:41 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Steve:
>>
>> In short, no. There's no good way for Solr to solve this problem in
>> the _general_ case. Well, actually we could create parsers with rules
>> like "if the colon is inside a paren, escape it). Which would
>> completely break someone who wants to form queries like
>>
>> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter
>> followed by a colon (:)").
>>
>> You say: " A better solution would be to have Solr support a new
>> parameter that I can pass to Solr as part of the URL."
>>
>> How would Solr know _which_ parts of the URL to escape in the case above?
>>
>> You have to do this at the app layer as that's the only place that has
>> a clue what the peculiarities of the situation are.
>>
>> But if you're using SolrJ in your app layer, you can use
>> ClientUtils.escapeQueryChars() for user-entered data to do the
>> escaping without you having to maintain a separate list.
>>
>> Best,
>> Erick
>>
>> On Mon, Apr 20, 2015 at 8:39 AM, Steven White <swhite4...@gmail.com>
>> wrote:
>> > Hi Shawn,
>> >
>> > If the user types "title:(Apache: Solr Notes)" (without quotes) than I
>> want
>> > Solr to treat the whole string as raw text string as if I escaped ":",
>> "("
>> > and ")" and any other reserved Solr keywords / tokens.  Using dismax it
>> > worked for the ":" case, but I still get SyntaxError if I pass it the
>> > following "title:(Apache: Solr Notes) AND" (here is the full URL):
>> >
>> >
>> >
>> http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title
>> >
>> > So far, the only solution I can find is for my application to escape all
>> > Solr operators before sending the string to Solr.  This is fine, but it
>> > means my application will have to adopt to Solr's reserved operators as
>> > Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that to
>> my
>> > applications escape list).  A better solution would be to have Solr
>> support
>> > a new parameter that I can pass to Solr as part of the URL.
>> > This parameter will tell Solr to do the escaping for me or not (missing
>> > means the same as don't do the escaping).
>> >
>> > Thanks
>> >
>> > Steve
>> >
>> > On Mon, Apr 20, 2015 at 10:05 AM, Shawn Heisey <apa...@elyograg.org>
>> wrote:
>> >
>> >> On 4/20/2015 7:41 AM, Steven White wrote:
>> >> > In my application, a user types "Apache Solr Notes".  I take that text
>> >> and
>> >> > send it over to Solr like so:
>> >> >
>> >> >
>> >> >
>> >>
>> http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND
>> >> >
>> >> > And I get a hit on "Apache Solr Release Notes".  This is all good.
>> >> >
>> >> > Now if the same user types "Apache: Solr Notes" (notice the ":" after
>> >> > "Apache") I will get a SyntaxError.  The fix is to escape ":" before I
>> >> send
>> >> > it to Solr.  What I want to figure out is how can I tell Solr /
>> Lucene to
>> >> > ignore ":" and escape it for me?  In this example, I used ":" but my
>> need
>> >> > is for all other operators and reserved Solr / Lucene characters.
>> >>
>> >> If we assume that what you did for the first query is what you will do
>> >> for the second query, then this is what you would have sent:
>> >>
>> >> q=title:(Apache: Solr Notes)
>> >>
>> >> How is the parser supposed to know that only the second colon should be
>> >> escaped, and not the first one?  If you escape them both (or treat the
>> >> entire query string as query text), then the fact that you are searching
>> >> the "title" field is lost.  The text "title" becomes an actual part of
>> >> the query, and may not match, depending on what you have done with other
>> >> parameters, such as the default operator.
>> >>
>> >> If you use the dismax parser (*NOT* the edismax parser, which parses
>> >> field:value queries and boolean operator syntax just like the lucene
>> >> parser), you may be able to achieve what you're after.
>> >>
>> >>
>> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
>> >> https://wiki.apache.org/solr/DisMaxQParserPlugin
>> >>
>> >> With dismax, you would use the qf and possibly the pf parameter to tell
>> >> it which fields to search and send this as the query:
>> >>
>> >> q=Apache: Solr Notes
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >>
>>

Reply via email to