Have anybody tried to tweak AnalysingSuggester with ngram token filter to expand such infix queries?
On Thu, Feb 2, 2017 at 6:55 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 2/2/2017 8:15 AM, Maciej Ł. PCSS wrote: > > regardless of the value of such a use-case, there is another thing > > that stays unknown for me. > > > > Does SOLR support a simple and silly 'exact substring match'? I mean, > > is it possible to search for (actually filter by) a raw substring > > without tokenization and without any kind of processing/simplifying > > the searched information? By a 'raw substring' I mean a character > > string that, among others, can contain non-letters (colons, brackets, > > etc.) - basically everything the user is able to input via keyboard. > > > > Does this use case meet SOLR technical possibilities even if that > > means a big efficiency cost? > > Because you want to do substring matches, things are somewhat more > complicated than if you wanted to do a full exact-string-only query. > > First I'll tackle the full exact query idea, because the info is also > important for substrings: > > If the class in the fieldType is "solr.StrField" then the input will be > indexed exactly as it is sent, all characters preserved, and all > characters needing to be in the query. > > On the query side, you would need to escape any special characters in > the query string -- spaces, colons, and several other characters. > Escaping is done with the backslash. If you are manually constructing > URL parameters for an HTTP request, you would also need to be aware of > URL encoding. Some Solr libraries (like SolrJ) are capable of handling > all the URL encoding for you. > > Matching *substrings* with StrField would involve either a regular > expression query (with .* before and after) or a wildcard query, which > Erick described in his reply. > > An alternate way to do substring matches is the NGram or EdgeNGram > filters, and not using wildcards or regex. This method will increase > your index size, possibly by a large amount. To use this method, you'd > need to switch back to solr.TextField, use the keyword tokenizer, and > then follow that with the appropriate NGram filter. Depending on your > exact needs, you might only do the NGram filter on the index side, or > you might need it on both index and query analysis. Escaping special > characters on the query side would still be required. > > The full list of characters that require escaping is at the end of this > page: > > http://lucene.apache.org/core/6_4_0/queryparser/org/apache/ > lucene/queryparser/classic/package-summary.html?is-external=true#Escaping_ > Special_Characters > > Note that it shows && and || as special characters, even though these > are in fact two characters each. Typically even a single instance of > these characters requires escaping. Solr will also need spaces to be > escaped. > > Thanks, > Shawn > > -- Sincerely yours Mikhail Khludnev