We have released lexical masks as ShEx files before, schemata for
lexicographic forms that can be used to validate whether the data is
complete.

We saw that it was quite challenging to turn these ShEx files into forms
for entering the data, such as Lucas Werkmeister’s Lexeme Forms. So we
adapted our approach slightly to publish JSON files that keep the
structures in an easier to parse and understand format, and to also provide
a script that translates these JSON files into ShEx Entity Schemas.

Furthermore, we published more masks for more languages and parts of speech
than before.

Full documentation can be found on wiki:
https://www.wikidata.org/wiki/Wikidata:Lexical_Masks#Paper

Background can be found in the paper:
https://www.aclweb.org/anthology/2020.lrec-1.372/

Thanks Bruno, Saran, and Daniel for your great work!
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to