GoranSMilovanovic added a comment.

  @Lydia_Pintscher @RazShuty
  
  Something to begin with:
  
  - each node is a language (Wikimedia language codes 
<https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all> are 
used);
  - each language points towards the three most similar languages to it,
  - in terms of the overlap in the respective language labels across >57M 
Wikidata items:
  - (explanation: for each language we search what WD items have a label in it,
  - then: similarity between two languages == Jaccard distance 
<https://en.wikipedia.org/wiki/Jaccard_index> between two binary vectors of 
length approx. 57M each).
  
  F30078182: WD_Languages.png <https://phabricator.wikimedia.org/F30078182>
  
  Mapping WDCM item re-use statistics onto languages now.

TASK DETAIL
  https://phabricator.wikimedia.org/T223119

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Aklapper, Lydia_Pintscher, RazShuty, GoranSMilovanovic, darthmon_wmde, 
DannyS712, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, 
Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to