Hi Erick, I think you missed my point. My request is, Solr support a new URL parameter. If this parameter is set, than EVERYTHING in q is treated as raw text (i.e.: Solr will do the escaping vs. the client).
Thanks Steve On Mon, Apr 20, 2015 at 1:08 PM, Erick Erickson <erickerick...@gmail.com> wrote: > How does that address the example query I gave? > > q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter > followed by a colon (:)"). > > bq: "Solr will treat everything in the search string by first passing > it to ClientUtils.escapeQueryChars()." > > would incorrectly escape the colons after field1, field, field2 and > correctly escape the colon after d and in parens. And parens are a > reserved character too, so it would incorrectly escape _all_ the > parens except the ones surrounding the colon. > > The list of reserved characters is pretty unchanging, so I don't think > it's too much to ask the app layer, which knows (at least it better > know) which bits of the query were user entered, what rules apply as > to whether the user can enter field-qualified searches etc. Only armed > with that knowledge can the right thing be done, and Solr has no > knowledge of those rules. > > If you insist that the client shouldn't deal with that, you could > always write a custom component that enforces the rules that are > particular to your setup. For instance, you may have a rule that you > can never field-qualify any term, in which case escaping on the Solr > side would work in _your_ situation. But the general case just doesn't > fit into the "escape on the Solr side" paradigm. > > Best, > Erick > > > On Mon, Apr 20, 2015 at 9:55 AM, Steven White <swhite4...@gmail.com> > wrote: > > Hi Erick, > > > > I didn't know about ClientUtils.escapeQueryChars(), this is good to know. > > Unfortunately I cannot use it because it means I have to import Solr > > classes with my client application. I want to avoid that and create a > > lose coupling between my application and Solr (just rely on REST). > > > > My suggestion is to add a new URL parameter to Solr, such as > > "q.ignoreOperators=[true | false]" (or some other name). If this > parameter > > is set to "false" or is missing, than the current behavior takes effect, > if > > it is set to "true" than Solr will treat everything in the search string > by > > first passing it to ClientUtils.escapeQueryChars(). This way, the client > > application doesn't have to: a) be tightly coupled with Solr (require to > > link with Solr JARs to use escapeQueryChars), and b) keep up with Solr > when > > new operators are added. > > > > What do you think? > > > > Steve > > > > On Mon, Apr 20, 2015 at 12:41 PM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > >> Steve: > >> > >> In short, no. There's no good way for Solr to solve this problem in > >> the _general_ case. Well, actually we could create parsers with rules > >> like "if the colon is inside a paren, escape it). Which would > >> completely break someone who wants to form queries like > >> > >> q=field1:whatever AND (a AND field:b) OR (field2:c AND "d: is a letter > >> followed by a colon (:)"). > >> > >> You say: " A better solution would be to have Solr support a new > >> parameter that I can pass to Solr as part of the URL." > >> > >> How would Solr know _which_ parts of the URL to escape in the case > above? > >> > >> You have to do this at the app layer as that's the only place that has > >> a clue what the peculiarities of the situation are. > >> > >> But if you're using SolrJ in your app layer, you can use > >> ClientUtils.escapeQueryChars() for user-entered data to do the > >> escaping without you having to maintain a separate list. > >> > >> Best, > >> Erick > >> > >> On Mon, Apr 20, 2015 at 8:39 AM, Steven White <swhite4...@gmail.com> > >> wrote: > >> > Hi Shawn, > >> > > >> > If the user types "title:(Apache: Solr Notes)" (without quotes) than I > >> want > >> > Solr to treat the whole string as raw text string as if I escaped ":", > >> "(" > >> > and ")" and any other reserved Solr keywords / tokens. Using dismax > it > >> > worked for the ":" case, but I still get SyntaxError if I pass it the > >> > following "title:(Apache: Solr Notes) AND" (here is the full URL): > >> > > >> > > >> > > >> > http://localhost:8983/solr/db/select?q=title:(Apache:%20Solr%20Notes)%20AND&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND&defType=dismax&qf=title > >> > > >> > So far, the only solution I can find is for my application to escape > all > >> > Solr operators before sending the string to Solr. This is fine, but > it > >> > means my application will have to adopt to Solr's reserved operators > as > >> > Solr grows (if Solr 5.x / 6.x adds a new operator, I have to add that > to > >> my > >> > applications escape list). A better solution would be to have Solr > >> support > >> > a new parameter that I can pass to Solr as part of the URL. > >> > This parameter will tell Solr to do the escaping for me or not > (missing > >> > means the same as don't do the escaping). > >> > > >> > Thanks > >> > > >> > Steve > >> > > >> > On Mon, Apr 20, 2015 at 10:05 AM, Shawn Heisey <apa...@elyograg.org> > >> wrote: > >> > > >> >> On 4/20/2015 7:41 AM, Steven White wrote: > >> >> > In my application, a user types "Apache Solr Notes". I take that > text > >> >> and > >> >> > send it over to Solr like so: > >> >> > > >> >> > > >> >> > > >> >> > >> > http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND > >> >> > > >> >> > And I get a hit on "Apache Solr Release Notes". This is all good. > >> >> > > >> >> > Now if the same user types "Apache: Solr Notes" (notice the ":" > after > >> >> > "Apache") I will get a SyntaxError. The fix is to escape ":" > before I > >> >> send > >> >> > it to Solr. What I want to figure out is how can I tell Solr / > >> Lucene to > >> >> > ignore ":" and escape it for me? In this example, I used ":" but > my > >> need > >> >> > is for all other operators and reserved Solr / Lucene characters. > >> >> > >> >> If we assume that what you did for the first query is what you will > do > >> >> for the second query, then this is what you would have sent: > >> >> > >> >> q=title:(Apache: Solr Notes) > >> >> > >> >> How is the parser supposed to know that only the second colon should > be > >> >> escaped, and not the first one? If you escape them both (or treat > the > >> >> entire query string as query text), then the fact that you are > searching > >> >> the "title" field is lost. The text "title" becomes an actual part > of > >> >> the query, and may not match, depending on what you have done with > other > >> >> parameters, such as the default operator. > >> >> > >> >> If you use the dismax parser (*NOT* the edismax parser, which parses > >> >> field:value queries and boolean operator syntax just like the lucene > >> >> parser), you may be able to achieve what you're after. > >> >> > >> >> > >> > https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser > >> >> https://wiki.apache.org/solr/DisMaxQParserPlugin > >> >> > >> >> With dismax, you would use the qf and possibly the pf parameter to > tell > >> >> it which fields to search and send this as the query: > >> >> > >> >> q=Apache: Solr Notes > >> >> > >> >> Thanks, > >> >> Shawn > >> >> > >> >> > >> >