dcausse added a comment.

  Thanks for this!
  These are mostly `sitelinks` I think, the information behind them could 
perhaps be moved into its own context but I'm unsure if it's necessary.
  It also shows that wikidata may still have duplicated sitelinks which is not 
good and it might be interesting to extract them: the object (and context to 
know the affected entities)  of the 1049 duplicates you found (simply taking 
http://www.w3.org/1999/02/22-rdf-syntax-ns#type should be enough, all others 
should point to the same object).
  
  This list would help wikidata folks to cleanup the duplicated sitelinks but 
could also be an artifact of the export process, the export process is slow 
enough that wikidata may change during that time causing duplicates to appear 
in the dump.
  
  FWIW it relates to T44325 <https://phabricator.wikimedia.org/T44325>

TASK DETAIL
  https://phabricator.wikimedia.org/T289754

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: JAllemandou, Aklapper, dcausse, AKhatun_WMF, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to