Re: SQL-like queries (with percent character) - matching an exact substring, with parts of words

Mikhail Khludnev Thu, 02 Feb 2017 12:18:07 -0800

Have anybody tried to tweak AnalysingSuggester with ngram token filter to
expand such infix queries?


On Thu, Feb 2, 2017 at 6:55 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/2/2017 8:15 AM, Maciej Ł. PCSS wrote:
> > regardless of the value of such a use-case, there is another thing
> > that stays unknown for me.
> >
> > Does SOLR support a simple and silly 'exact substring match'? I mean,
> > is it possible to search for (actually filter by) a raw substring
> > without tokenization and without any kind of processing/simplifying
> > the searched information? By a 'raw substring' I mean a character
> > string that, among others, can contain non-letters (colons, brackets,
> > etc.) - basically everything the user is able to input via keyboard.
> >
> > Does this use case meet SOLR technical possibilities even if that
> > means a big efficiency cost?
>
> Because you want to do substring matches, things are somewhat more
> complicated than if you wanted to do a full exact-string-only query.
>
> First I'll tackle the full exact query idea, because the info is also
> important for substrings:
>
> If the class in the fieldType is "solr.StrField" then the input will be
> indexed exactly as it is sent, all characters preserved, and all
> characters needing to be in the query.
>
> On the query side, you would need to escape any special characters in
> the query string -- spaces, colons, and several other characters.
> Escaping is done with the backslash.  If you are manually constructing
> URL parameters for an HTTP request, you would also need to be aware of
> URL encoding.  Some Solr libraries (like SolrJ) are capable of handling
> all the URL encoding for you.
>
> Matching *substrings* with StrField would involve either a regular
> expression query (with .* before and after) or a wildcard query, which
> Erick described in his reply.
>
> An alternate way to do substring matches is the NGram or EdgeNGram
> filters, and not using wildcards or regex.  This method will increase
> your index size, possibly by a large amount.  To use this method, you'd
> need to switch back to solr.TextField, use the keyword tokenizer, and
> then follow that with the appropriate NGram filter.  Depending on your
> exact needs, you might only do the NGram filter on the index side, or
> you might need it on both index and query analysis.  Escaping special
> characters on the query side would still be required.
>
> The full list of characters that require escaping is at the end of this
> page:
>
> http://lucene.apache.org/core/6_4_0/queryparser/org/apache/
> lucene/queryparser/classic/package-summary.html?is-external=true#Escaping_
> Special_Characters
>
> Note that it shows && and || as special characters, even though these
> are in fact two characters each.  Typically even a single instance of
> these characters requires escaping.  Solr will also need spaces to be
> escaped.
>
> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev

Re: SQL-like queries (with percent character) - matching an exact substring, with parts of words

Reply via email to