A partial outcome of the research in natural language processing in the last decade is the representation of language as numeric vectors, called word embeddings. These are used in large language models such as Bert, Elmo, and (Chat)GPT. A peculiar aspect of these numeric vectors is that they cluster semantically, so that words for similar concepts (dog, puppy, pet) group together even though their spelling is very different. This can be used for "semantic" search. If a search query (dog) is converted to a vector, it can search terms found in documents (e.g. wiki articles) that have similar vectors and find those of similar content even though the text doesn't match.
https://en.wikipedia.org/wiki/Word_embedding Here are just two of very many videos that explain the concept: https://www.youtube.com/watch?v=xzHhZh7F25I https://www.youtube.com/watch?v=MUve9LiEAeI Is there any ongoing work at WMF or around the Mediawiki software to apply this new technique to search in Wikipedia? -- Lars Aronsson (l...@aronsson.se, user:LA2) Linköping, Sweden _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/