You should not leave it in the qf field. You’re getting confused by the difference between query _parsing_ and the analysis chain. The parsing turns your top-level query of “ice cream” (assuming without quotes) into something like
f1:ice f1:cream f2:ice f2:cream This is happening way before analysis takes over. what you need is for both “ice” and “cream” to be passed as a unit to the analysis chain, and if you rely on the qf parameter it won’t happen. Best, Erick > On Sep 30, 2019, at 7:24 PM, Ashwin Ramesh <ash...@canva.com> wrote: > > Thanks Erick, that seems to work! > > Should I leave it in qf also? For example the query "blue dog" may be > represented as separate tokens in the keyword index. > > > > On Mon, Sep 30, 2019 at 9:32 PM Erick Erickson <erickerick...@gmail.com> > wrote: > >> Have you tried taking your keyword field out of the “qf” param and adding >> it explicitly? As keyword:”ice cream” >> >> Best, >> Erick >> >>> On Sep 30, 2019, at 5:27 AM, Ashwin Ramesh <ash...@canva.com> wrote: >>> >>> Hi everybody, >>> >>> I am using the edismax parser and have noticed a very specific behaviour >>> with how sow=true (default) handles multiword keywords. >>> >>> We have a field called 'keywords', which uses the general >>> KeywordTokenizerFactory. There are also other text fields like title and >>> description. etc. >>> >>> When we index a document with a keyword "ice cream", for example, we know >>> it gets indexed into that field as "ice cream". >>> >>> However, at query time, I noticed that if we run an Edismax query: >>> q=ice cream >>> qf=keywords >>> >>> I do not get that document back as a match. This is due to sow=true >>> splitting the user's query and the final tokens not being present in the >>> keywords field. >>> >>> I was wondering what the best practise around this was? Some thoughts I >>> have had: >>> >>> 1. Index multi-word keywords with hyphens or somelike similar. E.g. "ice >>> cream" -> "ice-cream" >>> 2. Additionally index the separate words as keywords also. E.g. "ice >> cream" >>> -> "ice cream", "ice", "cream". However this method will result in the >> loss >>> of intent (q=ice would return this document). >>> 3. Add a boost query which is an edismax query where we explicitly set >>> sow=false and add a huge boost. E.g*. bq={!edismax qf=keywords^1000 >>> sow=false bq="" boost="" pf="" tie=1.00 v="ice cream"}* >>> >>> Is there an industry practise solution to handle this type of problem? >> Keep >>> in mind that the other text fields may also include these terms. E.g. >>> title="This is ice cream", which would match the query. This specific >>> problem affects the keywords field for the obvious reason that the >> indexing >>> pipeline does not tokenize keywords. >>> >>> Thank you for all your amazing help, >>> >>> Regards, >>> >>> Ash >>> >>> -- >>> *P.S. We've launched a new blog to share the latest ideas and case >> studies >>> from our team. Check it out here: product.canva.com >>> <https://product.canva.com/>. *** >>> ** <https://www.canva.com/>Empowering the >>> world to design >>> Also, we're hiring. Apply here! >>> <https://about.canva.com/careers/> >>> <https://twitter.com/canva> >>> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> >>> <https://twitter.com/canva> <https://facebook.com/canva> >>> <https://au.linkedin.com/company/canva> <https://instagram.com/canva> >>> >>> >>> >>> >>> >>> >> >> > > -- > *P.S. We've launched a new blog to share the latest ideas and case studies > from our team. Check it out here: product.canva.com > <https://product.canva.com/>. *** > ** <https://www.canva.com/>Empowering the > world to design > Also, we're hiring. Apply here! > <https://about.canva.com/careers/> > <https://twitter.com/canva> > <https://facebook.com/canva> <https://au.linkedin.com/company/canva> > <https://twitter.com/canva> <https://facebook.com/canva> > <https://au.linkedin.com/company/canva> <https://instagram.com/canva> > > > > > >