Hi All

I have been trying to implement multi word synonyms using `sow=false` into
a pre-existing system that applied pre-processing to the phrase to apply
wildcards around the terms, i.e. `bread stick` => `*bread* *stick*`.

I got the synonyms expansion working perfectly, after discovering the
`preserveOriginal` filter param, but then I needed to re-implement the
existing wildcard behaviour.
I tried using the edge-ngram filter, but found that when searching for the
phrase `bread stick` on a field containing the word `breadstick` and
`q.op=AND` it returns no results, as the content `breadstick` does not
_start with_ `stick`. The previous wildcard behaviour would return all
documents that contain the substrings `bread` AND `stick`, which is the
desired behaviour.
I tried using the ngram filter, but this does not support the
`preserveOriginal`, and so loses a lot of relevance for exact matches, but
it also results in matches that are far too broad, creating 21 tokens from
`breadstick` for `minGramSize=3` and `maxGramSize=5` that in practice
essentially matches all of the documents. Which means that boosts applied
to other fields, such as 'in stock', push irrelevant documents to the top.

Finally, I tried to strip out ngrams entirely and use subquery/LocalParam
syntax and local params, a solr feature that is not very well documented.
I created something like `q={!edismax sow=true v=$widlcards} OR {!edismax
sow=false v=$plain}` to effectively create a union of results, one with
multi word synonyms support and one with wildcard support.
But then I had to implement the other edismax params and immediately
stumbled.
Each query in production normally has a slew of `bf` and `bq` params, and I
cannot see a way to pass these into the nested query using local variables.
If I have 3 different `bf` params how can I pass them into the local param
subqueries?

Also, as the search in production is across multiple fields I found passing
`qf` to both subqueries using dereferencing failed, as the parser saw it as
a single field and threw a 'number format exception'.
i.e.
q={!edismax sow=true v=$tw tf=$tqf} OR {!edismax sow=false v=$tp tf=$tqf}
$tw=*bread* *stick*
$tp=bread stick
$tqf=title^2 desctiption^0.5

As you can guess, I have spent quite some time going down this rabbit hole
in my attempt to reproduce the existing desired functionality alongside
multiterm synonyms.
Is there a way to get multiterm synonyms working with substring matching
effectively?
I am sure there is a much simpler way that I am missing than all of my
attempts so far.

Solr: 8.3

Thanks
Martin Graney

-- 
 <https://www.linkedin.com/company/sooqr-com/>

Reply via email to