> On Aug 31, 2019, at 12:00 PM, Toke Eskildsen <t...@kb.dk> wrote: > > Whenever we do this normalisation, we index two versions in our index: A very > lightly normalised (lowercased) field and a heavily normalised field: If a > record has a title "Köket" (kitchen in Swedish), we store title_orig:köket > and title_norm:køket. […] Going with what we do, my answer would be: Yes, do > preserve and also remove :-)
Right after I posted, I realized that I wanted to say “include all” as an option. They can even be in the same field with synonyms at the same token position. Also, don’t worry too much about creating junk terms in the index with nonsense transliterations. Terms are cheap in search indexes (up to a point). So it really is OK to have all of these indexed at the same position, even if the last one is garbage. This still has the schön/schon problem, but at least there is a match. coöperation cooperation cooepoeration (typewriter umlaut version) wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)