[Wikitech-l] Word embeddings / vector search

Lars Aronsson Mon, 08 May 2023 12:47:02 -0700

A partial outcome of the research in natural language processing
in the last decade is the representation of language as numeric
vectors, called word embeddings. These are used in large language
models such as Bert, Elmo, and (Chat)GPT. A peculiar aspect of
these numeric vectors is that they cluster semantically, so that
words for similar concepts (dog, puppy, pet) group together even
though their spelling is very different. This can be used for
"semantic" search. If a search query (dog) is converted to a vector,
it can search terms found in documents (e.g. wiki articles) that
have similar vectors and find those of similar content even though
the text doesn't match.


https://en.wikipedia.org/wiki/Word_embedding

Here are just two of very many videos that explain the concept:
https://www.youtube.com/watch?v=xzHhZh7F25I
https://www.youtube.com/watch?v=MUve9LiEAeI

Is there any ongoing work at WMF or around the Mediawiki software
to apply this new technique to search in Wikipedia?


--
  Lars Aronsson (l...@aronsson.se, user:LA2)
  Linköping, Sweden
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Word embeddings / vector search

Reply via email to